THE ANNALS 
of 
MATHEMATICAL 


STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Locally Best Unbiased Estimates. E. W. BARANKIN 


A Sequential Decision Procedure for Choosing one of Three Hypotheses Con- 


cerning the Unknown Mean of a Normal Distribution. Mitton SoBe. 
AND ABRAHAM WALD 


Moments of Random Group Size Distributions. JoHn W. TuKrey 


The Power of the Classical Tests Associated with the Normal Distribution. 
J. WoLFowItTz 


Application of the Method of Mixtures to Quadratic Forms in Normal Vari- 
ates. HERBERT RoBBINs AND E. J. G. Prtman 


The Joint Distribution of Serial Correlation Coefficients. M.H.QuENoUILLE 561 


On the Estimation of the Number of Classes in a Population. Lro A. 
GoopMAN 


Concerning Compound Randomization in the Binary System. Joun E.Watsx 580 


The Distribution of Extreme Values in Samples Whose Members are Subject 
to a Markoff Chain Condition. BENJAMIN EpsTEIN 


Notes: 


Note on the Consistency of the Maximum Likelihood Estimate. 
ABRAHAM WALD 


On Wald’s Proof of the Consistency of the Maximum Likelihood Estimate. 
J. WoLFowITz 


A Note on Random Walk. HeErsert T. Davip 


Numerical Integration for Linear Sums of Exponential Functions. 
RosBert E. GREENWOOD 


Smoothest Approximation Formulas. ArtHur Sarp 


On the Power Function of the ‘‘Best’’ t-test Solution of the Behrens- 
Fisher Problem. JoHN E. WALSH 


A Note on Fisher’s Inequality for Balanced Incomplete Block Designs. 
R. C. Rose 


Abstracts of Papers 
News and Notices 
Report on the Boulder Meeting of the Institute 


Vol. XX, No. 4— December, 1949 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. S. WILKS, Editor 


mM. S. BARTLETT HARALD CRAMER J. NEYMAN 
WILLIAM G. COCHRAN W. EDWARDS DEMING WALTER A. SHEWHART 
ALLEN T. CRAIG J. L. DOOB JOHN W. TUKEY 4 
C. C. CRAIG W. FELLER 
HAROLD HOTELLING 
WITH THE COOPERATION OF 


T. W. ANDERSON, JR. CHURCHILL EISENHART H. B. Mann 

Davip BLACKWELL M. A. GirsHIcK ALEXANDER M. Moop 

J. H. Curtiss PauL R. Hatmos FREDERICK MostTELLER 

J. F. Daur Paut G. Hogi H. E. Rossins 

Haroup F. DopcGs Marx Kac Henry Scoerré 

Paut 8S. DwryER E. L. LEHMANN JacoB WoLFOWITZ 
Wi.turam G. Mapow 


The ANNALS oF Maruematicat Sratistics is published quarterly by the © 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, — 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt, 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, C. H. Fischer, Business Administration Build- 7 
ing, University of Michigan, Ann Arbor, Mich. : 


Changes in mailing address which are to become effective for a given issue 
should be reported to the Secretary on or before the 15th of the month preceding 
the month of that issue. The months of issue are March, June, September and 
December. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS | 
after December 31, 1949 should be sent to T. W. Anderson, Department of 4 
Mathematical Statistics, Columbia University, New York 27, New York. Man- 7 
uscripts should be typewritten double-spaced with wide margins, and the orig- © 
inal copy should be submitted. Footnotes should be reduced to a minimum and 
whenever possible replaced by a bibliography at the end of the paper; formulae in 
footnotes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALs is $10.00 inside the Western Hemi- 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $10.00 per volume or $3.00 per single issue. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BattimoreE, Mp., U. S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879. 














LOCALLY BEST UNBIASED ESTIMATES! 


By E. W. BaRANKIN 
University of California, Berkeley 


Summary. The problem of unbiased estimation, restricted only by the postu- 
late of section 2, is considered here. For a chosen number s > 1, an unbiased esti- 
mate of a function g on the parameter space, is said to be best at the parameter 
point % if its sth absolute central moment at 6 is finite and not greater than that 
for any other unbiased estimate. A necessary and sufficient condition is obtained 
for the existence of an unbiased estimate of g. When one exists, the best one is 
unique. A necessary and sufficient condition is given for the existence of only 
one unbiased estimate with finite sth absolute central moment. The sth absolute 
central moment at 4 of the best unbiased estimate (if it exists) is given explicitly 
in terms of only the function g and the probability densities. It is, to be more 
precise, specified as the l.u.b. of certain set @ of numbers. The best estimate is 
then constructed (as a limit of a sequence of functions) with the use of only the 
data (relating to g and the densities) associated with any particular sequence 
in @ which converges to the l.u.b. of @. 

The case s = © is considered apart. The case s = 2 is studied in greater 
detail. Previous results of several authors are discussed in the light of the present 
theory. Generalizations of some of these results are deduced. Some examples 
are given to illustrate the applications of the theory. 


1. Introduction. Let 2 be a space of points x, and yw be a totally additive 
measure defined on a o-field ‘f of subsets of 2. Let B = {pe , 6€O} be a family of 
probability densities in 2 with respect to the measure yu. O is any index set; we 
lay down no conditions on its structure. We are concerned here with the existence 
and characterization of unbiased estimates of a real-valued function g on 9, 
which are in some suitable sense “‘best’’ for a prescribed parameter point 4. 
That is, a real-valued, measurable (x) function fy on 2 such that 


(1) [ fopedu = 9(), 068, 


and which satisfies a specified criterion of bestness for 6 = 6). This criterion is 
usually taken to be 


(2) [ (fo — g(6))? pe, du S [ (f — gl00))® pe» du, re®, 


where J denotes the class of all unbiased estimates of g; i.e., the class of all f 
satisfying (1). The obvious advantage in the definition (2) is the algebraic 


1 This article was prepared while the author was under contract with the Office of Naval 
Research. 


477 





478 E. W. BARANKIN 


pliability. The obvious disadvantage is that )t may contain no estimate with 
finite variance (cf. section 9). 

For the investigation of the fundamental questions, posed above, relating to 
unbiased estimates, we shall not restrict ourselves to (2). We consider chosen 
and fixed, a number s > 1, and lay down the 

DEFINITION. fo € M is best at % if 


2 > | \fo— 96)" poy du S [1S — G0) I" py du, eM. 
Q Q 


With this, and under the condition of a rather natural postulate on $B (cf. section 
2), we exhibit a necessary and sufficient condition for the existence of an unbiased 
estimate of g having a finite sth absolute central moment at % .” 

Except for the discussion, in section 3, of the case in which g is constant on 0, 
we do not consider directly the estimation of g, but rather that of h = g — g(@). 
Lemma 1, of section 2, gives the solution of the problem for g when that for h is 
known. After section 3, it is assumed exclusively that h is not =0, except where 
the contrary is explicitly stated. 

In case s is finite, the existence theorem section 4, Theorem 2, asserts also the 
uniqueness of the best unbiased estimate of h. It is interesting to observe the 
similarity between the proof of this uniqueness and Fisher’s proof of the (what 
might be called) asymptotic uniqueness of an efficient estimator [2 pp. 704, 705]. 
The case s = ° is discussed in section 5; in this case we find that, in general, 
the best estimate is not unique. However, for s both finite and infinite, and as 
well when g is constant (.. h = 0), we give a necessary and sufficient condition 
that there be a unique unbiased estimate with finite s.a.c.m.* (ef. section 4, 
Corollary 2-1, and section 5, Theorem 3 (iii)). 

Theorem 2 determines the s.a.c.m. of the best estimate as the l.u.b. of a set of 
numbers given explicitly; and thereby, in particular, throws open the class of 
all lower bounds of the minimum s.a.c.m. Investigations after such lower bounds, 
in the classical case s = 2, have led to the well-known results of Cramér-Rao 
[3 p. 480, (32.3.3)], and Bhattacharyya [4, p. 3, (1.10)]. In section 6, which is 
devoted to obtaining various special lower bounds, we show how those particular 
bounds fall out. It should be remarked, however, that our conditions on $ are 
in general different from those of the above authors. 


2For the cases = 2 analternative existence condition, antedating these results, but not 
yet published, has been obtained by C. Stein. 

3 If we use, in the above definition, the sth root of the sth absolute central moment, 
instead of the latter itself, then the bestness criterion for s = © is the limiting criterion 
for s — ©; viz., 


2 > ess. sup. | fo — g(8o) | S ess. sup lf — g(9%) | . feM, 
zeQ zeQ 


where ess. sup. refers to the measure »(A) = / pe) dh. 
A 
4 The abbreviation s.a.c.m. will henceforth be used to indicate sth absolute central 
moment at @>. 















UNBIASED ESTIMATES 479 


In section 7 we give, in Theorem 7 and its corollary, a construction of the 
best estimate, depending only on the knowledge of the minimum s.a.c.m. The 
latter, as indicated in the preceding paragraph, is always known independently 
of any knowledge of the best estimate. We use these results to obtain explicitly 
(Theorems 8 and 9) the best estimates, for arbitrary s, in two cases where 
we assume the minimum s.a.c.m. known. These cases, when s = 2, give the 
minimum variance as determined by the equality sign in the Cramér-Rao and 
Bhattacharyya inequalities, respectively. 

Section 8 is given to a brief discussion of the special case s = 2. Finally, in 
section 9, we present a detailed study of an example. 

At the suggestion of the referee we have added an appendix in which is given a 
brief running description of the fundamental ideas of Banach spaces that come 
into use here. The italicized phrases are those mentioned explicitly in the course 
of the paper. 

We shall merely mention here certain points which will be elaborated further 
in future communications. (1) The general theory developed here pertains as 
well to sequential as to nonsequential estimation; one has only to make the 
proper identification of Q, , u, and $$. Moreover, as applied to sequential 
estimation, the theory will determine the optimum stopping regions. (2) The 
discussion of section 5 below can be carried through with “ess. sup.’ referring 
to the measure yp, and % being the space of functions on 2 which are integrable 
(u); and for this, no restrictions whatsoever on the densities pp are required 
(cf. the postulate of section 2), since the pp are elements of this %& solely by 
virtue of their properties as probability densities. This development would, for 
example, be sufficient to yield the estimate of Girshick, Mosteller, and Savage [5] 
in the case of sequential binomial estimation. Also, this unrestricted analysis is 
fundamental for the problem of similar regions (a case of the bounded unbiased 
estimation of a constant function). (3) For any s > 1 it may be observed in the 
result of Theorem 7 below, that the best (at 4) estimate depends only on a 
sufficient statistic; this is clear from Neyman’s theorem on sufficient statistics 
[6], since the best estimate depends only on ratios of the density functions pp. 
But more than this, Blackwell’s method [7] of deriving a uniformly (over the 
parameter set) better unbiased estimate from a given unbiased estimate can be 
proved to remain valid also when the measure of dispersion is the sth absolute 
central moment, s > 1. And for this, the postulate of section 2 is not required. 
(4) Finally, we point out that, with the proper specializations of 0, Cramér’s 
theorem on the ellipsoid of concentration [8], Bhattacharyya’s multidimensional 
inequality [9], and the extensions of the Rao, Cramér, and Bhattacharyya 
bounds to sequential estimation—as, for example, by Blackwell and Girshick 
[1], Wolfowitz [10], and Seth [20]—can be drawn from Theorem 4 below. 

The inspiration for the mode of analysis in the following pages, and the 
major part of its substance, come from F. Riesz: his book [11 Ch. III] and the 
article [12] (in particular sections 8-11 thereof). In strictly mathematical ter- 
minology, Theorems 2 and 3 are given in [11] for the sequence-spaces f, ; and 


480 E. W. BARANKIN 


Theorem 2 in [12] for the spaces %, of functions on the real interval [0, 1] with 
Lebesgue integrable rth powers. The proofs are given there for the case of a 
denumerable set ©; in [12] an indication is given of the extension to a non- 
denumerable ©. Our proof of Theorems 2 and 3, however, follows that given by 
Banach [13, p. 74] for the case of denumerable 0. It is based on two results, a 
theorem of Hahn-Banach [13, p. 55, Theorem 4], and the representation theorem 
(suitable for the general type of ¥%, that we consider) for bounded linear func- 
tionals on &, [14, p. 338, Theorem 46]. The first of these, and the representation 
theorem for any r > 1, spring in fact from the same article [12, p. 475] of Riesz. 
In the case r = 1, the representation theorem is due originally to Steinhaus [15]; 
in the case r = 2, it was developed simultaneously in 1907 by Riesz [16] and 
Fréchet [17]. 

Riesz’ proofs of the sufficiency of the condition in Theorem 2 proceed by 
constructing an explicit sequence of functions on 2 which converge strongly in 
%, to the (in the present statistical terminology) best estimate. Precisely, if in 
Theorem 7 below, we take, for each n = 1, 2, --- , the numbers ay , az , +--+ , a, 
so that the expression 

kn 


>> ath?) | 


| i=1 

| kn l 

| n - 

\| i Os Te, || 

| i=l \|r 
is maximum, then the assertion of this theorem is that of Riesz. However, 
Theorem 7 is established here without this strict requirement on the a; . The 
dropping of this restriction was essential for the proofs of Theorems 8 and 9. 
The latter two theorems are, in fact, proved with the use of Corollary 7-1, 
which is an even stronger result than Theorem 7. This corollary falls out of the 
proof of Theorem 7 immediately, in consequence of our use of Lemma 2 for that 
proof. The lemma, moreover, eases the proof of Theorem 7 markedly, in doing 
away with the need for any differentiation. 


2. Preliminary considerations. We begin then by introducing the absolutely 
continuous (with respect to 4) measure, defined on ‘F, 


(A) = f Dee de, Ae§. 
A 


A function ¢ is summable (v) over Q if and only if ¢ - pe, is summable (u) over 
Q; and we have 


p dv = / $* Pb, du, 
2 2 


(cf. [18, pp. 36-38]). Assuming that each of the ratios 


pe(x) 


19(x) = De, (x)’ 























UNBIASED ESTIMATES 481 





is defined almost everywhere (x) throughout Q, it follows that f is an unbiased 
estimate of g if and only if 


(3) | trode = oo), 060. 
Q 

We define 

h(8) = g(@) — g(%). 





Since 
/ am dv = 1, 6€0, 
2 


it is clear from (3) that f is an unbiased estimate of g if and only if f — g(@) is 
an unbiased estimate of h. Moreover, f is best, for g, at % if and only if f — g(0) 
is best, for h, at %. 

Define 


§ 


oo ioe 


? 

and let &%, and %, be the spaces, normed in the usual way, of real-valued functions 
on 2, with summable (v) absolute rth and sth powers, respectively. We denote 
the respective norms by || ||, and || ||, ; that is, if &€&, and ne&,, 


l/r 
bell = r 
I € Il (fie! iv) ; 
1/8 
lne= (fiat ar). 


We note that these spaces, for s < ©, are weakly compact (cf. [21]). This 
property will be used in the proof of Theorem 7. Also, we shall make explicit use 
of the representation theorem for linear functionals on ¥&, [14, p. 338, Theorem 46]. 
The assumptions on $, or on Bo = {79 , 8 €Q}, will now be the following. 
PosTuLATE: The functions me are defined almost everywhere (u) in Q, and are 
elements of %, . 
The foregoing considerations combine to give the following equivalence. 
LemMA 1. ¢ + g(6) is an unbiased estimate of g, which is best at 6 , if and 
only if (2) do satisfies the equations 


and 


(4) [ om dv = h(@), 6€0, 
Q 


and (it) when ¢ is any other function satisfying (4), we have 


I] @ |le 2 || Go [le 5 





482 E. W. BARANKIN 


that is, if and only if do is an unbiased estimate of h with minimum (finite) norm in 
Y,. The s.a.c.m. of 0 + g(O) is precisely || do |\s . 
Starting with section 4, we shall deal directly with the estimation of h. 


3. The case of constant g. Throughout the remainder (section 4 et seq.) of 
this article, the function h is assumed, unless the contrary is explicitly stated, 
to be non-constant; that is, since h(@) = 0, not = 0. We can, and shall in this 
section, obtain the results of the desired kind for the case of a constant function g, 
by a brief, direct attack. 

Let g(@) = go, a constant. Then of course h(@) = 0. One unbiased estimate of g 
is immediately obvious, viz., fi(x) = go. The s.a.c.m. of f; is 0. 

There will exist other? unbiased estimates of g with finite s.a.c.m. if and 
only if there exist non-null unbiased estimates, in %, , of 0 = h. That is, by virtue 
of the isomorphism between <, and the space of linear functionals on %, , there 
will exist an unbiased estimate of g with finite s.a.c.m., distinct from fi , if and 
only if there exists a non-null functional on %, which vanishes on the elements of 
Bo = {re , 0€O}. And a necessary and sufficient condition that such a functional 
exist is that J be not a fundamental set in &, [13, p. 58, Theorem 7]. 

Observe finally that, in any case, f; is the unique unbiased estimate of g with 
vanishing s.a.¢c.m. 

We collect these results in the following statement. 

THEOREM 1. Jf g(0) = go , a constant, then there is a unique best unbiased estimate 
of g; viz., filxz) = go. And the s.a.c.m. of f; is 0. 

A necessary and sufficient condition that there exist no other unbiased estimates 
of g having finite s.a.c.m. is that the set Bo be fundamental in &, . 

As an illustration of the ideas of this section, consider the following example: 
Q is the real interval [0, 1]; u is Lebesgue measure; © is the set of non-negative 
integers; and 


p(x) = (0+ 1)x”. 


And take @ = 0. Then, » is again Lebesgue measure, and 7» = pe for each 8. 
For definiteness, take r = 2 (the results in this case are the same for any r 2 1). 
It is well-known that the non-negative integer powers of « form a fundamental 
set in % on a finite real interyal. That is, if € is a function on [0, 1], such that 


1 
| & dx < o, and if e > O, then there exist an integer n and coefficients bo , 
0 


5 That is, distinct from f; in the sense of &, ; or, equivalently, differing from f; on a set 
of positive (v) measure. Whenever, in the sequel, an equation & = & appears, for two 
functions & and & in &, or &s , equality almost everywhere (v) in 2 will be understood. 
It is a consequence of our postulate that if two functions on 2 are equal almost everywhere 
(v), they are equal almost everywhere (v’), where v’ is anyone of the measures v’(A) = 


/ pe du, 0’ € @. 
A 





rh 


= % 


B 


H 


B 


Hi 








1d 


re 
1d 
of 
al 


tes 


le: 
ive 


1 6. 

1). 
ital 
hat 


bo ; 





UNBIASED ESTIMATES 


bi, -** » ¥n such that 


1 n 2 
[ (« — dd “‘) dx <.e. 
Jo 


1=0 


Hence, in this case an unbiased estimate with finite variance at 6 = 0 is unique 
(as well for a non-constant function g as for one which is constant over 9; cf. 
section 4, Corollary 2-1). 


4. The main theorem for non-constant h. We shall denote by IM, the class 
(or the set in &,) of all unbiased estimates of h that belong to &, . 

THEOREM 2. (i) A necessary and sufficient condition that M, be non-empty is 
that there exist a constant C such that for every set of n functions m4, , 79. , *** 5 Ton 5 
in Jo , and every set of n real numbers a, , 2, +++ , Qn, we have, for every n = 1, 2, 


n 


| n 
(5) Z ah(@;) = C Zz Q; To; 
| i=l i=l r 
(ii) For every @ € Ws, we have | o\\; 2 Co, where Co is the g.l.b. of the set of 
admissible constants C in (5). 
(iii) Jf M, ts non-empty there is a unique doe Ms with | d\\, = Co. Thus, 
do 18 the unique unbiased estimate of h which is best at @% . 
The non-constancy of h clearly implies Cy > 0. 
The necessity of condition (5) is immediate. Suppose ¢ € I, , so that ¢ satisfies 


equations (+); then, for any 4, , #,--- ,6,,and any real numbers aq; , @2, +++ , Qn, 
nr n 
} a; h(0;) = [ 2. a; 79,° dv. 
i=l Q° i=l 


By the Hélder inequality it follows that 
|< “— : 
| 2 aih(@i) S 1¢ |Ie- 2, a 76; 


Hence (5) is satisfied with C = || ¢ 

Part (ii) of the theorem is hereby proved as well. 

Suppose 9M, non-empty, and @o, di in Wt, , such that |g |; = |, di |js = Co. 
Then 1/2 (go + ¢1) € Mi, and therefore 


1/2 || do + dr |ls 2 Co. 
But, by the Minkowski inequality, 
1/2 || bo + dr |le S 1/2 (\| do |p + || br ||s) = Co, 
Hence 
| bo + dr ||s = || Go |e + || dr |is- 


This equality implies ¢; = a@ ¢ for some positive a. But since the norms of ¢o 
and ¢; are equal (and +0) a must be unity. Thus the uniqueness of ¢ isproved. 


484 E. W. BARANKIN 


It remains now to prove, assuming (5) satisfied, the existence of ¢o . Consider 
the functional F on [ defined by 


F(a) = h(@). 

The Hahn-Banach theorem alluded to in section 1 (viz., [13, p. 55, Theorem 4)) 
has precisely (5) as a necessary and sufficient condition for the existence of a 
linear functional G on &, satisfying 


(a) G(me) = h(6), 0€9; 
(b) ||G || $C; 


where || G@ || is the norm of G, i.e., 


Gl] = lub, @)! 
ge&, i E ir 
In particular, taking C = Cy, there is a linear functional G on &, with 


(a’) Go(we) = h(0), 0€9 
(b’) || Go || S Co. 


But, for an element 7 a;7e; in the linear manifold [Bo] spanned by the ze, 
i=l 


Go (2 a; 76; ) = Da h(6;), 


so that 


| Go|] = Lup, LO! ~ @,, 
ge(Bol || € |r 
Hence (b’) is replaced by the precise statement 


(b’”) || Go || = Co. 


Now the representation theorem for linear functionals on &, asserts the exist- 
ence of @o € , , such that 


Go(é) = [ 0-5 wy, 
and 


| Go ||e = || Go|) = Co. 
This taken with (a’) establishes the existence of go « 


%, satisfying 


| [ 607, dv = h(@), 
| || bo lle = Co. 


and this completes the proof of the theorem. 








UNBIASED ESTIMATES 485 










It is readily seen that Yt, will consist of more than just ¢o if and only if there 
exists a non-null functional on %, which vanishes on 8%. Our discussion in 
section 3 therefore enables us to assert the following. 










Coro.uaryY 2-1. It, , when it is non-empty, consists of do alone if and only if 
Po is fundamental in &, . 

A word is in order concerning the following two consequences of the bounded- 
ness of the measure v: (i) if Bo C LY, , then also Po C &,- for every r’ < 1; (ii) if 
¢ €%, then also ¢ € &,, for every s’ < s. Otherwise stated: (i’) if Bo satisfies the 
postulate of section 2 for the number 7, it likewise satisfies this postulate for 
every (admissible) r’ < 1; (ii’) if Mt, is non-empty, then M,- is non-empty for 
every s’ < s. Regarding (i’) we shall make only the obvious remark that although 
{ satisfies the postulate for every r’ < r, there may be values of r’ < r such 
that no C for (5) exists; this will be exemplified in section 9. Where (ii’) is con- 
cerned, it is clear that the non-emptiness of Jt, will not necessarily imply that 
Bo C Lye. for every s’ < s, even though for every such s’ Mi,’ is non-empty. 
If for every ¢ €® other than % we have 7 ¢ %4’/.’-1 , for some particular s’ < s, 
then we may have the situation in which there are elements in Jt,” with norms 
arbitrarily close to 0. However, this cannot be the case if (a) for some @ other 
than 0, me € &s’/e’-1 , and (b) h does not vanish identically on 0’, the set of those 
6 for which ze € &7/."-1 . For, when these two conditions are satisfied, Theorem 
2 applies to h as defined on 9’; consequently there is a positive lower bound 
for the s’—norms of the unbiased estimates of h over ©’. And since every ele- 
ment of IN,’ is, in particular, an unbiased estimate of h over 0’, it follows that 
the norms of those elements are bounded below by a positive number. 



























5. The case s = © (r = 1). Let Mt. denote the class of essentially bounded 
(v) unbiased estimates of h; and let bestness at 4 be defined with respect to the 
essential absolute suprema of the elements of this class. That is, the unbiased 
estimate ¢o , of h, is best at 4 if 







ess. sup. | do(xz) | < o, 
zeQ 





and if, when ¢ is another unbiased estimate of h, we have 


ess. sup. | @o(z) | S ess. sup. | d(x) |. 
xeQ xXeQ 





The fundamental postulate for the functions 7 is, in this case, that Bp C & . 
Now, %«, the space of essentially bounded, measurable (v) functions on Q, 
normed by ess. sup., is the space of linear functionals on %, [14, p. 338]. Examina- 
tion of the proof of Theorem 2 will show that that proof goes through also in the 
present case in all but one detail: we cannot here in general prove the uniqueness 
of the best estimate. The proof of uniqueness breaks down since the equality 








ess. sup. | do(x) + ¢i(r) | = ess. sup. | do(x) | + ess. sup. | d:(x) | 








486 E. W. BARANKIN 


does not imply that ¢; is a constant multiple of do . Of course, if Bo is fundamental 
in % , we have a fortiori the uniqueness of the best estimate. 

The results for the case s = ~ are then the following. 

THEOREM 3. (i) A necessary and sufficient condition that Mt. be non-empty is 
that there exist a constant C such that for every set of n functions me, , Te, °** 5 Ty, 
in Bo, and every set of n real numbers a, , d2, +++ , Gn, we have, for every n = 1, 
o ++, 


z ON $Cida 76; 
| t=] | i=1 1 

(ii) For every oe Me we have || ||. 2 Co, where Co is the g.l.b. of the set of 
admissible constants C' above. 

(iii) When M.. ts non-empty, it contains elements with norm equal to Cy . These 
are the best (at 0) unbiased estimates of h. When Bo is not fundamental in & , 
there need not exist a unique best estimate. 

We close this section with the remark that Theorem 1 remains valid, as it 
stands, in the cases = ©, 

6. Particular lower bounds for the minimum s.a.c.m. In order to stress their 
significance in the statistical context, we shall give the statements of this section 
with the help of the symbol o,(¢) for the sth root of the s.a.c.m. of the unbiased 
estimate ¢, of h. We have of course, the relation 


o.(¢) = || ¢ |e. 


Now, one of the most important aspects of Theorem 2 is that it presents us 
immediately with an explicit evaluation of the minimum o,(¢) for all ¢ « M,. 
We state the formula in the form of a theorem. 

THEOREM 4. Let R denote the set of all real numbers. Then, 


g.l.b.o.(¢) = lab. 2 a:h(6;) 





oeM, 01:09.°**,0ne@ [tel 
@j1.29s°** Ane ||_n 
emhg,-e> 
QAiTs. x 
ji=l oe 


For brevity, let us set 


min 


_ gb. o.(¢) = oF 
oeM, 

Since this theorem expresses o7" as the 1.u.b. of an explicit set of numbers, 
it is clear that the class of all lower bounds of o3'” is thereby thrown open to us. 
It follows that, when s = r = 2 and our hypotheses on § are fulfilled, the classical 
lower bounds of Cramér-Rao [8, p. 480] and Bhattacharyya [4, p. 3] are par- 
ticularized consequences of Theorem 4. In the results that follow here we shall 
indicate the deduction of those classical bounds. We need not, however, restrict s. 
For a moment, let us denote by z(x) the function on © which assigns the 
value 7,(x) to the point p e9, and let © be an interval on the real axis. Then we 
shall, below, write ws for the function (when it exists) on 2 which assigns the 


4 


et of 


"hese 


a 
as it 


sheir 
:tion 
ased 


ts us 
M,. 


bers, 
O US. 
sical 
par- 
shall 
‘ict s. 
s the 
n we 
s the 


UNBIASED ESTIMATES 487 


value (dx(x)/dp),-0 to x €Q. Similarly, +¢ for the function assigning the value 
(d’x(x)/dp?) 9 to x; and so on. 

THEOREM 5. Suppose the following conditions fulfilled: 

(i) O = J, an interval on the real axis; 

(ii) hws differentiable on’ S& J; 

(iii) for each 0 €O’, x5 is de fined almost everywhere (v), and is an element of &, ; 

(iv) for each 6 <Q’, 
Tp.) — 76 , 


-— 7, = 0. 


lim ‘ 
— 6 r 


po 





p 
. e / / 
Then, for anym +n (m,n = 1,2, ---) points 0, 02, +++, Onin I, and 6; , 6, 
,9,7n ©’, andanym + n real numbers a,, a2, +++, 4m, 01, b2, +++ ,b, such that 


m 


> are, + Tak ~ 0, 


i=1 i=l ir 


we have 


| m 


Lah) +O bw). 


i=l 


a aime; + > b; 39’ 


i=l i=1 r 





(6) e 


Iv 


The prime on the A in (6) denotes the derivative of h. 


To prove this theorem, observe first that by virtue of Theorem 4, we may write 


ss nea + b, “io —* — h(6;) | 





min | i=l i=] a a= 6 
8 = Pe Ee ee 
Tv — © 
> a;™9; + = b = si 
i=! i=1 oe 6; r 
for every set of points p: , p2, --* , px in J such that the denominator of the right- 


hand side is defined and + 0. Therefore, also 


[ ™m 


S ached + 3 4, = | 


. ‘ . a ; — 0; 
(7) oc. =~ lim “Ean ten maee 
pivot T — Pe, | 
gunk. 9,0 0.65% >, Gite, + > b; Te, — TO; os 
t=1 i=1 6; r 


Now, by condition (iv), the element 


™m 


» aj Te; + = b; = wa aL 


i=l i— Of 





of Y, converges, in the strong sense in ¢%,, to 


™ 


n 
2 Aj 6; + ie b; 76., 


i=1 








488 E. W. BARANKIN 


as pi > 6; ,2 = 1, 2, --- , n. Consequently we have convergence of the norm; 
that is, the denominator of the right-hand side of (7) converges to the denomi- 
nator of (6). (The latter is #0, so that for all p; sufficiently close to 6:,i= 
1, 2, ---, nm, the ratios in (7) are defined.) There is no difficulty about the 
convergence of the numerator of (7) to that of (6). The theorem is thus proved. 

Coro.uary 5-1. Under the hypothesis of Theorem 5, we have, in particular, 
when 6 €®! and || 76, ||, ¥ 0, 


|’ (60) | 
| a6 [le 


If we denote by p the function on 2 X © which assigns the value p,(x) to the 
point (x, 6), and write (8) in the form 


(8’) 


(8) & 





|W’ (Go) I 
e > al ’ 

iA 1108 BY nas, Peo dH 
the generalization of the Cramér-Rao inequality afforded by (8) becomes 
evident. 

Using the result and method of Theorem 5, we can establish the next in a 
hierarchy of theorems. 

THEOREM 6. Suppose the hypothesis of Theorem 5 satisfied, and the following 
condition fulfilled: for each 6 in a non-empty subset O”’ of ©’, (i) h’’(@) (the second 
derivative) exists and (ii) x¢ is defined almost everywhere (v), is an element of &, 
and satisfies 

in [22 — of 


= 0. 
ee || p — 8 








ii? 


Then, for any m + n + Q(m, n, gq = 1, 2, ---) points 0, O., -++ Om in J, 
6,02, °+-,0nin@’,and 6; ,02,°--, 6, in ©”, and any m + n + q real numbers 
G1, 02, °°: ,Am,b,,b2, --- , bn, C1, C2, °** , Cg Such that 








| = = / f a” i 
| Do aime; + Do dimes + Daim es|| £0, 
| i=l im] i=l] \|r 






we have 
x a:h(6,) + > b:h'(0;) + > ch’ (6;) 
g* 2 


, uy 
> aime; + - bime: + > Ci 7 6% 
| fon! i=1 i=1 


Just as in the case of the previous theorem, we have here an immediate corollary. 
Coro.uary 6-1. Under the hypothesis of Theorem 6, we have in particular, when 
0 € Qe’ . oe”, 











| 
i? 





min | bh'(8 iC o) + ch’ "(0) | 
(9) Os < \| b bo + crs, . * 






































UNBIASED ESTIMATES 489 





for any two real numbers, b and c, such that the denominator of the right-hand side 
does not vanish. 
Consider (9) in the particular case s = r = 2. In this case, (9) may be written, 
explicitly, 

| bh’(8) + ch’’(8) 


(10) (o5 _ = f 1 é op - =) _ 
a prac Cc > ; 
0 Pe, \ 96 ee 


In particular, (10) holds for values of b and c which maximize the right-hand 
side. And that maximum value is found, in the usual way, to be 


J" [h' (60)}) + 27h! (O0)h!’ (0) +- J™[h’’ (00))°, 
where the matrix 
Sa P 
C -“ 
is the inverse of the matrix 


| 2(eye 1 ap op y, 
| Jape \80) “" Sa pe 80 00? 


j 
] 


'f 1 dpa p i L (S2) 
a ae ae 
Jo pe 30 08" Jam Nae) 1 
Thus, we have 


(11) (or'")" = J"[h'(60)F + 2I"h'(G0)h" (60) + J™[h' (60))*. 
This is seen to be Bhattacharyya’s result for the case of derivatives up to second 
order. 

It is obvious how we extend Theorem 6 to obtain a similar result involving the 
functions 7, 76, " ee ae r for any assigned n. And it is thereafter clear 
how, in the case s = r = 2, Bhattacharyya’s general inequality may be deduced. 

It is clear that we can proceed from Theorem 4, under suitable conditions, 
to lower bounds for o, which involve integrals of the functions +(x) (and the 
corresponding integrals of h) as well as the derivatives of these functions. 

In closing this section we note that all the above considerations apply equally 
to the case s = ©. 

7. Determination of the best estimate. We shall now prove the following 
theorem, which provides an explicit construction of the best (at 4) estimate of h. 
We repeat that s is now taken to be finite. 

THEOREM 7. Let IN, be non-empty, and ¢p be the best (at 6) unbiased estimate of h. 
Let {07,7 = 1,2, --- ,kn},n = 1,2, --- , be a sequence of (finite) sets of points of 
0, and {aj ,i = 1,2, --- , ka}, n = 1, 2, --- , a sequence of sets of real numbers, 
such that 





> a; h(67) 
lim —7—__—_- = C 


n-?o [3 


= || doll. = oF” 


o 


> a; Te; 


| ml 

















490 E. W. BARANKIN 


Then the functions ¢,, : 


kn 
ai h(O?) | kp ris he 
i=l 1 
t.(x) = 2 a Zz. a; me" (x)) sgn (x Qj 14" (2)) 
=1 = 
Qs Wée ; 
i=1 r 


(are elements of %, and) converge strongly in &, to do . 
The strong convergence here means precisely that 

lim / lon — do | dv = 0. 

n—2 JQ 

Clearly, we may, with no loss in generality, assume the numbers a; to be 
such that 
kn 
(12) dD alm =1,n=1,2,°-- 
|| t=1 r 


We shall suppose this to be the case throughout the proof. Then the essential prop- 
erty of the 6; and the a? is that 


kn 1 
(13) lim | 0 af A(6?)| = Co. 


no | i=l | 


And in this normalized situation, the functions ¢, will be given by 
kn 

son (x ap rat(2)) ‘ 
i=l 

That these functions are elements of &, is easily seen; in fact, 


Il tn Ile = XS af h(02) 


rie 


kn | kn 
(4) fla) = Dakar). Ya? ret(z) | 


The proof of this theorem will consist mainly in the application of the following 
two lemmas. 

Lemna 2. Let 0 ¥ ne, , and {&,,n = 1, 2, ---} be a sequence of functions in 
g, such that 


(i) | x [lp = 1, n= I, 2, --- 
(i) lim [ ade « tet. 
Then &, converges strongly in &, to the function 
1 s/r 
fS=7 jae te | sgn 7. 
1) 7 |Is 


Let us observe first that 


(15) [ gna = Ine 


)- 


UNBIASED ESTIMATES 491 


and 


|! g || = 1. 


Furthermore, £ is the unique element with norm 31 in &, having the property 
(15). For, if also, 


I, fon dy = |In lle, I| Eo i & 1, 


we then have 


[MG + &)-ndv = IIa lls 


and from this, 


1 
2 





| + £0 Ile Il 7 lle = || ole. 
That is, 

|| & + £0 [I 
From this, and (Minkowski) 


IV 
bo 
IV 


1 Hef ty 
I Eo Ile + [| &0 ||, 


a 


, es ' 
| £0 + &o || S || & ||, + || & ll, 


we have 
" Yon i t i! / 1] 
I} 9 + &o [lr = || So [lr + || So Ilr. 


Therefore, for some a > 0, & = ag. But we must have a = 1 if f) and £ are 
both to satisfy (15), as assumed. Hence £ = &. 

Now consider the sequence {&,}. Choose a sub-sequence {&;} that converges 
weakly to, say, &’. Then || ¢’ ||/ S 1. We have 


Endy = lim [ gndv = II nlle. 
im 


Hence, ¢’ = &. And since 1 = || &; ||, > 1 = || &||,, it follows that &, converges 
strongly to £ (ef. [13, p. 139, section 3]). 
Suppose there is a subsequence {£,,} of {£,} such that 


| én; — & || > 8 > 0, (wthion, 


We have, nonetheless, for this subsequence, the hypotheses of our lemma 
satisfied. We can therefore apply the argument of the previous paragraph to 
extract a subsequence of {£,,}, which converges strongly to & . This is in obvious 
contradiction to the above 6-assumption, and the lemma is hereby proved. 
Lemma 3. Lemma 2 remains true with the roles of %, and %, interchanged. 
This is obvious. 
Returning now to the proof of Theorem 7, let us first, for the sake of brevity, 








492 E. W. BARANKIN 


introduce the notation: 


kn 
cn = Do a? h(6?), 
t=] 


sen (3 a? A062), 


Yn 


t=] 
kn 
n 
Yn = » Qi Te; 
i=1 


From 


/ go te dv = h(8), 6€0, 
2 


we easily obtain 


~~ 


| d0vndr = cn, n=1,2,--- 
Q 


which we may write 
| d0-rn¥nde = | eal, n= 1,2,--- 
Q 


Since | cn | — || do ||s (cf. (13)) and || ya¥n ||) = 1,72 = 1, 2, ---, (ef. (12)), we 
have, by Lemma 2, that 7,¥, converges strongly to 


1 
(16) Yo = or | bo |*” sen go. 
0 


The functions (cf. (14)) 
on = a | Vn - sgn Vn 


obviously satisfy 


[ ba-rndn dy = [eal n= 1,2,--- 
2 
And from this we conclude that 


lim | f{nYodvy = Cy, 
nae JQ 
or 

; Sn aon tis sae 

lim | —Yodv = 1 = || ypoll,. 

no 2 | Cn | 
We may apply Lemma 3 to this result, since || ¢,/| cn | ||; = 1,n = 1,2, ---. 
And we thereby conclude that ¢,/| c, | converges strongly to 


| Yo |"* sgn Yo , 


UNBIASED ESTIMATES 493 


which, on substituting from the definition (16) of Yo , we find to be just 


do 
C,’ 
Since | c, | — Co, it follows immediately that ¢, converges strongly to ¢o , and 
the theorem is proved. 
The following corollary is actually of greater use in applications than Theorem 
7 itself, for the reason that it leaves no doubt about the form of lim ¢, (i.e., ¢o) 
when we know explicitly the form of lim ynyn . 
CoroLLary 7-1. Assume the hypothesis of Theorem 7. Then the functions 


kn 
sgn (= a; h(o?)) kn 


i=l 


converge strongly, in %, , to a function Yo , and 
go = Co| Yo - sgn Yo. 
This is clear from the proof of the theorem. 


By way of illustrating the application of these results, we shall prove the 
following theorem. 


THEOREM 8. Assume the hypothesis of Theorem 5. And, further, let the equality 
sign hold in (8). Then, 


h’ (0) 


|| 05 Ilr 





do(x) = 


- | mo,(a)|"*sgn 39,(2). 


Since (8) is an equality, we may under the hypothesis of Theorem 5, consider 
that we have 








1 | 
|) to) — —1_ nia | 
(17) hg ei i re eet, 











where {p,} is a sequence in J converging to 6 . The numerator of the right-hand 
side of (17), sans the vertical bars, converges to h’(@) (which is +0, since 
Cy ~ 0); hence, for all sufficiently large n, that expression has the signum of 
h’(@). The functions whose norms appear in the denominator of (17) we know 
to converge strongly in &, to 1b, (by the hypothesis of Theorem 5). Hence, for 
this case, the function Yo of Corollary 7-1 is 


sgn h’() , 


II m0 [le “°°” 








= 


Therefore, by the same corollary, 


| h’(%) | |sgn h’(@) , |"! 
z)= > . | —— 
ow) = Sl tel 7 











494 E. W. BARANKIN 


-son h’(6)-sgn me, (x) 
_ _h'() 


= || a ||? | Tog (x) sgn (2). 


And this is the result asserted in the theorem. 

The reader will have no difficulty in establishing, in the exact pattern of the 
preceding proof, the following. 

THEOREM 9. Assume the hypothesis of Theorem 6. And, further, let the equality 
sign hold in (9) for b = bo, c = co - ® Then, 


boh’ (40) + Ch!" (8) 


\%o) - ro. ey itl , Woe. 
T bo Xe rs Come, Th | bo T Ay (~)+ Co 9, (x) |" *. sgn (bo 9, (x) + &% To (x)). 


do(x) = 


It is evident that results of the type in these theorems may be built up as 
well with integrals over the parameter space. 

A question of considerable practical importance is that of the rapidity of 
convergence of the £, to do . An answer to this question, on the level of generality 
we are maintaining in this study, consists in relating this convergence to that 
of the | c, | to Co. In the case s = r = 2, the answer is immediate and exact: 


ie. ~ alll [ (Sn — de)® dy 


iz dy — 2 | dohndr + [ sia 
Q 2 Q 


2 : 2 
— Cn | —2\¢cn} + Co 


I 
So 
| 
> 


Thus, if one unbiased estimate is known, it provides, since its norm is 2Co, 
an upper bound for || ¢, — @o |/2. The same is true in the general case (any s) 
once we have established an upper bound, depending on Co and | c, |, for 
\| fn — @o ||». But in the general case, a good upper bound does not seem to be 
so close at hand. There are indications of the direction in which one must proceed, 
and we hope to draw some significant results out of these before long. 


8. The case s = r = 2. The particular aspects of this case (where bestness 
of an estimate has reference to its variance), which arise out of the coincidence of 


Y. and &, , merit some discussion. We shall denote the inner product, [ Endy, of 
Q 


two functions £ and 7 in &, as usual by (é, 7). Let {0} denote the closed linear 
manifold in % spanned by the 7. 

THEOREM 10. Let Nt. be non-empty. Then do is the unique element of Di. which 
lies in {Bo}. 


§ In the case s = 2, by and co are the values which render (11) an equality. 


the | 


lity 


ry 


40 
 g) 
for 

be 
ed, 


ess 


> of 
of 


Par 


ich 


UNBIASED ESTIMATES 495 


To begin with it is clear that the functions ¢, of Theorem 7, in the present case 
s = r = 2, are all elements of [fo], the linear manifold spanned by the 7. 
Hence, since ¢p is the strong limit of these elements, ¢o € {Lo}. 

Now suppose also ¢; € Di: , d: € {Bo}. Then, from 


(do ’ 74) — h(6), GO « ©, 
(d, ™%) = h(@), 6€0O, 
we have (¢: — oo, me) = O, 6€0, 


and, by continuity of the inner product, 
(g1 — go, &) = 0, E€ {Po}; 


that is, dr — doe {Po}>. But, from go e {Po} and ¢g; € {Po} it follows that 
¢: — do € {Po}. Hence d; — ¢ = O, and this proves the exclusiveness of the 
property for ¢o . 
Another characterization of ¢o is given by the following corollary. 
Coro.uaryY 10-1. If Ne is non-empty, then do is the unique element of Me which 
satisfies the system of equations in & : (, ) = || E\|3,¢€ Me. 
To see that ¢ has the asserted property, let @ be any element of Nt, and set 
= £&+ n, with € € {Po} and 7 € {Po}. From 


(f, me) = (E + 0, mo) = (, me) = h(8), 
it follows that & e Mt. . Hence £ = @. And so, 
(4, 60) = (Go + 2, $0) = || Go ||2. 


If 4 € Mt, has this property also, then both 


(1 , bo) = || do ||2 
and 

(Go , dx) = || or |l2 5 
and therefore 

I] dr |l2 = || Go jlo. 


This proves ¢; = ¢o, and so the corollary. 


9. An example. Let 2 be Euclidean n-space, x = (a , 2, +++, %n);, Lebesgue 
measure; ©, the set of real numbers; and 


1 f 1x 2 
SC r — _— 2\ 
Bol) = arya OP {5 2, (% — > 
And finally, let 4 = 0. Then 


c n 
a(x) = exp “j z (—200; + 6) . 
i=l ) 








496 E. W. BARANKIN 
If 0 < b < 3, and we define 


¢i(z) = (1 — 2b)” exp {b 2 “i - 1, 
t=] 
we have, for each @, 
nb 2 
[ (x) po(x) du = exp os a 1. 
Thus, ¢: is an unbiased estimate of the function h: 
— expt gh _ 
h(@) = exp eo "7 1. 
If we examine 


a 1 | ar... - 2 
lor ||. = aay || — 2b) exp {b & 21h —1 


i=l) 





& 1 n 2 
exp {-3 ~ “| du; 
i=l 
we find that this integral converges only for s < 1/2b. Shifting the emphasis, 
we may state: for the function h, defined by 
h(o) =e™ —1, a>QO, 
there exists an unbiased estimate with finite sth moment at 6 = 0, for each 


n+ 2a 


oh 





Next, observe that 
ie 1 " f 1 . 2 ‘ 2 
|| ro ||; = aan |, 1-32; (xi — 2réa; + rah du 
= exp {} nr(r — 16%}, 
so that the za,» are elements of &, for each r > 1. The ratio 


| | 

| AC) | = (e” — 1) exp |—4n(r — 1)6"} 

‘| 78 ||r 
is seen to diverge as @ > ~, if 

‘n(r —1) <a. 
Hence, by Theorem 2, there exists no unbiased estimate of h belonging to &, 
for a value of s such that the number 
nee alle 

s— 1 


satisfies the inequality just above; that is, for a value of s greater than 





UNBIASED ESTIMATES 497 
Otherwise stated: there exists no unbiased estimate of h with finite sth moment at 
6 = 0, for 


n+ 2a 
2 ae 





It is most likely true that this last statement, holds, in general, with 








<a" + 2a 
~ Qe 
We shall consider here only the case 
n+ 2a 
—— 2; 


and since the analysis is the same for every pair n, @ satisfying this equality, 
we treat the particular case of 


n= 1, a = 3, 
Thus, we shall show: for n = 1, there exists no unbiased estimate of he , 
ho(@) _ e” = 1, 


with finite variance at 6 = 0. 
We must show that the ratios 


% ae? — 1)| 

















are not bounded for all choices of m (distinct) 6;s, and all sets of,m real numbers 
a; , and all m. This is clearly equivalent to showing the same for the ratios 


~ a(1 — 7) 























Q(m, a, 6) = 
oH 6; 
i=l 
Now we find, by direct computation, 
|< — 40? F <> —H(0;—-0;)2 
Qe Dae my,|| = Do et a.a;. 
— i=l] 2 i,j=l 














And the solution of the familiar extremum problem: 


> «(1 — e*) 


i=1 


sup 


(a;) 


m 
subject to p » eH aa; = 1 


i,j=1 








yields 





2 
e*%), 





2 : —i03 
sup Q’(m, a;, 6;) = x a;(1 — eo) — 
(a3) iim 





A98 E. W. BARANKIN 


‘where. the matrix 


V = (v;5), 1,j = 1,2, --+,m, 
is the inverse of the matrix 
U =(e a) i,j = 1,2, +++ ,m. 
We now take, in particular, 
6; = dt, 4=1,2,---,m, 


where ¢ is a positive number. Clearly, there exists a number f) such that for 
ti>h, 

r —4(i-p2e? 

U(t) = (e ) 
is non-singular. Also, 


lim U(t) = J, 

to 
the identity matrix. Then, for ¢ > &, V = U~ is a continuous function of U, 
so that 


lim V(é) = (lim U())7 = J. 


to to 
Hence, 


lim v;;(2) = 0;;. 


t-20 

It follows that 

- 2 ‘ 

lim sup Q°(m, a;, it) = m, 

t+ (a;) 
and therefore, 
pa: ee . 
sup Q*(m, a;, 6;) = m. 
(a4,6;) 
(A simple argument on the characteristic values of U shows that there is actually 
equality here.) This result gives the unboundedness of the ratios Q; and our 
proposition is proved, by virtue of Theorem 2. 


APPENDIX 
The spaces 2, and \’, are instafices of a Banach space over the reals; that is, a 
complete, normed, linear vector space, closed under multiplication by real 
numbers. That the space, say 8, is normed is to say that there is a non-negative, 
real-valued function, || ||, defined on %, with the properties: 
| €|| = 0 if and only if € is the null vector, 
| | 1] | 
jag || =|a]- | éll, 
it ‘| 1] 1] | Wt. 
Eta sllell + [lal 


where &, » e 8 and a is real. The number || & || is called the norm of &. 


UNBIASED ESTIMATES °499 


The function || — 7 || on pairs &, 7 of vectors is a distance function in the 
usual sense. With it, strong convergence (or simply convergence) is defined in B: &, 
converges strongly to € when lim || &, — & || = 0. In symbols: £, — £ or lim é, = &. 


n> 0 
The usual set-theoretic notions are now defined in the obvious way; e.g., limit 
point of a set, closed set, etc. That the space 8 is complete means that every 
sequence {£,} satisfying lim || é, — &, || = 0 converges to a (unique) element 
mae 
E¢ B. 

A linear manifold M in B is a subset of B with the property that for any two 
elements £, 7 et and any two real numbers a, b, we have also até + by ¢ Me. 
A closed linear manifold is a linear manifold that is closed in the set-theoretic 
sense. If S is any subset of %, then the set, [.S], of all finite linear combinations of 
elements of S is a linear manifold; it 1s the linear mantfold spanned by S. The 
closure of [S], denoted by {S}, is called the closed linear manifold spanned by S. 
In general, [S] is a proper subset of {S}. A set S € % is called fundamental 
when {S} = 8%. 

A linear functional, G, on $ is a real-valued function with the property 
that for any two elements £, » « 8 and any two real numbers a, b, we have 


G(aé + bn) = aG(é) + bG(n). The linear functional G is said to be bounded when 
the number 


wp, 18 


— ee ll € || 


is finite. || G || is called the norm of G. (Throughout the text of the paper, the 
qualification “‘bounded” has been understood in all references to linear func- 
tionals). If we define the sum of two linear functionals F and G by (F + G) 
(&) = F(é) + G(é), and make the other requisite definitions in the obvious way, 
we find that the bounded linear functionals on § form a linear vector space 
over the reals. The function || || on the bounded linear functionals, which we 
have already called a norm, is in fact a norm in the Banach space sense. This 
vector space, so normed, is readily shown to be complete. Hence it isa Banach 
space—usually called the conjugate space to %. It is this space we have referred 
to in the text as the space of linear functionals on B. 

If a sequence {£,} of elements of 8 has the property that lim G(é,) = G(é) 


for every bounded linear functional G, then &, is said to converge weakly to &. 
If, of the sequence {£,}, we know only that lim G(é,) exists for every bounded 


no 
linear functional, we say simply that the sequence is weakly convergent. The 
space % is called weakly complete if every weakly convergent sequence converges 
weakly to a limit. The spaces 2,, 7 2 1 are weakly complete. % is said to be 
weakly compact if every bounded set S C $ contains a weakly convergent 
sequence. That S is “bounded” means — LE || < @., 
€S 


A real Hilbert space § is a real Banach space on which there is defined an 








400 E. W. BARANKIN 


inner product; that is, a function (£, 7) on pairs of elements £, 7, with the 
properties 


(é, ») = (n, &), 
(ag, n) = a(é, n), 

(E+ f, 0) = (& 2) + G, 9), 
él = & 8. 


The inner product is a continuous function of both its arguments; i.e., lim ém = & 
and lim 7, = 7 imply lim (Em, mn) = (&, 7). The space % in the text is a Hilbert 


space when we take (é, 7) = [ én dv. Two elements £, » which are such that 
Q 


(E, ») = O are said to be orthogonal. If S is any set in 5, then the set of elements 
of § each of which is orthogonal to every element of S is called the orthocomple- 
ment of S, and is denoted by S.. 

For further elaboration the reader is referred to [13] and [19]. 


REFERENCES 


{1] D. Buackwett and M. A. Girsuicx, “A lower bound for the variance of some unbiased 
sequential estimates,’ Annals of Math. Stat., Vol. 18 (1947), pp. 277-280. 

{2] R. A. Fisuer, ‘“‘Theory of statistical estimation,’? Camb. Phil. Soc. Proc., Vol. 22, 
(1925), pp. 700-725. 

{3] H. Cramtr, Mathematical Methods of Statistics, Princeton Press, Princeton, 1946. 

[4] A. Buarracuaryya, ‘‘On some analogues of the amount of information and their use 
in statistical estimation,’’ Sankhyd, Vol. 8 (1946), pp. 1-14. 

[5] M. A. Grrsuick, I’. MostTevier, ano L. J. SavaGce, “Unbiased estimates for certain 
binomial sampling problems,’”’ Annals of Math. Stat., Vol. 17 (1946), pp. 13-23. 

{6] J. Neyman, ‘“‘Su un teorema concernente le cosidette statistiche sufficienti,’’ Giornale 
dell’Istituto Italiano degli Attuari, Vol. 6 (1935), pp. 320-334. 

|7} D. BuackweELL, ‘Conditional expectation and unbiased sequential estimation,” 
Annals of Math. Stat., Vol. 18 (1947), pp. 105-110. 

[8] H. Cram&r, ‘‘A contribution to the theory of statistical estimation,’’ Skand. Aktuar. 
tids., Vol. 29 (1946), pp. 85-94. 

[9] A. Baarracuaryya, ‘‘On some analogues of the amount of information and their use 
in statistical estimation (cont’d),’’ Sankhyd, Vol. 8 (1947), pp. 201-218. 

[10] J. WoLrowrtz, ‘‘The efficiency of sequential estimates and Wald’s equation for se- 
quential processes,’”’ Anntls of Math. Stat., Vol. 18 (1947), pp. 215-230. 

[11] F. Rresz, Les Systemes d’Equations Linéaires a une Infinité d’Inconnues, Gauthier- 
Villars, Paris, 1913. 

[12] F. Riesz, ‘‘Untersuchungen iiber Systeme integrierbare Funktionen,’’ Math. Annalen, 
Vol. 69 (1910), pp. 449-497. 

[13] S. Banacu, Théorie des Opérations Linéaires, Garasinski, Warsaw, 1932. 

[14] N. Dunrorp, ‘Uniformity in linear spaces,’’ Am. Math. Soc. Trans., Vol. 44 (1938), 
pp. 305-356. 

[15] M. H. Srernuaus, ‘‘Additive und stetige funktionaloperationen,’? Math. Zeits., Vol. 
5 (1918), pp. 186-221. 


| 


| 


he 


rt 
at 


ts 


ed 


in 
®t 
ile 


UNBIASED ESTIMATES 501 


[16] F. Riesz, ‘Sur une espéce de géométrie analytique des systémes de fonctions som- 
mables,’’ Comptes Rendus, Vol. 144 (1907), pp. 1409-1411. 

[17] M. Fr&cuet, ‘Sur les ensembles de fonctions et les opérations linéaires,’’ Comples 
Rendus, Vol. 144 (1907), pp. 1414-1416. 

[18] S. Saks, Theory of the Integral, Stechert, New York, 1937. 

[19] J. von NEUMANN, Functional Operators, (Mimeographed notes) Princeton, 1935. 

[20] G. R. Sern, ‘On the variance of estimates,’”’ Annals of Math. Stat., Vol. 20 (1949), 
pp. 1-27. 

[21] B. J. Perris, ‘“‘A note on regular Banach spaces,’’ Am. Math. Soc. Bull., Vol. 44 (1938), 
pp. 420-428. 











A SEQUENTIAL DECISION PROCEDURE FOR CHOOSING ONE OF 
THREE HYPOTHESES CONCERNING THE UNKNOWN 
MEAN OF A NORMAL DISTRIBUTION 


By Mritron SosBpeLt AND ABRAHAM WALD! 


Columbia University 


1. Introduction. In this paper a multi-decision problem is investigated from 
a sequential viewpoint and compared with the best non-sequential procedure 
available. Multi-decision problems occur often in practice but methods to deal 
with such problems are not yet sufficiently developed. 

The problem under consideration here is a 3-decision problem: Given a chance 
variable which is normally distributed with known variance o?, but unknown 
mean 6, and given two real numbers a; < a2 , the problem is to choose one of the 
three mutually exclusive and exhaustive hypotheses 


H,:69< a Ho:aq S05 a H3:0> a2. 


In order to select a proper sequential decision procedure, the parameter space 
is subdivided into 5 mutually exclusive and exhaustive zones in the following 
manner. Around a, there exists an interval (@; , 6.) in which we have no strong 
preference between H, and H:2 but prefer (strongly) to reject H;. Around a, 
there exists an interval (63, 4;) in which we have no strong preference between 
Hz or H; but prefer (strongly) to reject H,. For 6 S 6, we prefer to accept H,. 
For 6 S 6 S 63 we prefer to accept H.. For 6 = 6; we prefer to accept H3 . 

The intervals (@; , 6) and (63, @;) will be called indifference zones. The de- 
termination of these indifference zones is not a statistical problem but should 
be made on practical considerations concerning the consequences of a wrong 
decision. 

In accordance with the above we define a wrong decision in the following 
way. For 0 S 6, , acceptance of Hz or H; is wrong. For 0, < @ < 6 acceptance of 
H; is wrong. For 6. S @ S 6; , acceptance of 1; or H; is wrong. For 6; < 8 < 44, 
acceptance of H,is wrong. For 6 2 6, , acceptance of H, or H2 is wrong. 

The requirements on our decision procedure necessary to limit the probability 
of a wrong decision are investigated. Two cases are considered. 


Case 1: Prob. of a wrong decision S y for all 6. 
(Prob. of a wrong decision < y; for 6 < 6, 


| a 
Case 2: { Prob. of a wrong decision S y2 for 4, << 0 < 4, 


IIA 


| Prob. of a wrong decision S y3 for 6 = 4. 


The decision procedure discussed in the present paper is not an optimum 
procedure since, as will be seen later, the final decision at the termination of 


! Work done under the sponsorship of the Office of Naval Research. 


502 


re 


al 


e 


£ 
l2 


A SEQUENTIAL DECISION PROCEDURE 503 


experimentation is not in every case a function of only “the sample mean of all 
the observations’’, although the sample mean is a sufficient statistic for @. Al- 
though the procedure considered is not optimal it is suggested for the following 
reasons: 

1. The decision procedure can be carried out simply. In fact tables can be con- 
structed before experimentation starts that render the procedure completely 
mechanical. 

2. The derivation of the operating characteristic (OC) function, neglecting the 
excess of the cumulative sum over the boundary, is accomplished with little 
difficulty. In general, for other multi-decision problems it is unknown how to 
obtain the OC function. 

3. It is believed that the loss of efficiency is not serious; i.e., the suggested 
sequential procedure is not far from being optimum. In this connection a non- 
sequential procedure is compared with this sequential procedure. The results 
show that, for the same maximum probability of making a wrong decision, the 
sequential procedure requires on the average substantially fewer observations to 
reach a final decision. In fact, for Case 1 noted above, if .008 < y < .1, and if 
certain symmetrical features are assumed, then the fixed number of observations 
required by the non-sequential method is greater than the maximum of the 
average sample number (ASN) function taken over all values of 8. 

It was found necessary in the course of the investigation to put an upper bound 
on the quantity ot in order that the methods used to obtain upper and lower 

2 — ay 
bounds for the ASN function should give close results. This restriction, however, 
is likely to be satisfied in practical applications. 

All formulas for ASN and OC functions which will be used in this paper will be 
approximation formulas neglecting the excess of the cumulative sum over the 
boundaries. Nevertheless, equality signs will be used in these formulas, except 
when additional approximations are involved. 


2. Description of the Decision Procedure.’ We shall assume that the indiffer- 
ence zones described above have the following properties 


(i) A<a<& SA3< a < % 


(ii) A; + 0, = 2a; ; 63 aL 04 = 2a2 
(iii) 05 = 6, => 04 = 63 = A (say). 


2A similar decision procedure was used by P. Armitage [2] as an alternative to the 
sequential ¢ test (with 2-sided alternatives). The form used there is more restricted as he 
considers only the case 62 = @; . Essential inequalities on the OC function are pointed out 
but no attempt is made to determine the complete OC and ASN functions. A closely related 
but somewhat different procedure for dealing with a trichotomy was suggested by Milton 
Friedman while he was a member of the Statistical Research Group of Columbia University. 
As far as the authors are aware, no results were obtained concerning the OC and ASN fune- 
tions of Friedman’s procedure. 













MILTON SOBEL AND ABRAHAM WALD 


Let R, denote the Sequential Probability Ratio Test for testing the hypothesis 
that @ = 6, against the hypothesis that 6 = 6. We assume for the present that 
either the proper constants A, B in the probability ratio test are given or that 
they are approximated from given a, 8 by the relations 









ini” Sagk. 

a l—ea 
Here a and 8 are upper bounds on the probabilities of first and second types of 
errors, respectively. 

Let R, represent = S.P.R.T. for testing the hypothesis that 6 = 6; against the 
alternative that @ = 6,. For this test we assume that (a, 8, A, B) are replaced 
by (&, 8, A, B) Pe as ikea that either A and B are given or that they are 
email from given a, 8. 

The decision procedure is carried out as follows: 

Both R,; and R, are computed at each stage of the inspection until 

RKither: One ratio leads to a decision to stop before the other. Then the former 
is no longer computed and the latter is continued until it leads to a decision to 
stop. 

Or: Both R, and R, lead to a decision to stop at the same stage. In this event 
both computations are discontinued. 

The following table gives the rule R for the decisions to be made corresponding 
to all possible outcomes of R; and R,. 


Ri Re» R 























« 





























| If accepts 4; and | accepts 4; then | a A, 
| - . / 
| If | accepts & | and | accepts 83 | then | accepts H: | 
pace eiannNTE poreeeaeatnan aici a 
If accepts @ | and | accepts 4 | then | aecepts Hs 





We ‘shall show that acceptance of both @, and 6 is hiceniaiihile when (A, ‘B) = 
(A, B). For this purpose we need the acceptance number and rejection number 
formulas. (See page 119 of [1]). 


Acceptance Number Rejection Number 






Ry: / log B+ an < : fe <= log A + an 


Ra © log B+ an < = f. < x log A + an. 
a=! 






We shall assume for convenience that ‘‘between observations” FR, is tested before 
Rz and let the term ‘‘initial decision’’ refer to the first decision made. 

Assume 6, and @, are both accepted. Then if 6, is accepted initially at the mth 
stage 








™ 


2 
> s, = z log B + a, m. 


a=! 


A SEQUENTIAL DECISION PROCEDURE 
2 2 
o o 
z log B+ am < 2 log B + asm 


it follows that 4, is rejected at the same stage, contradicting the hypothesis. 
Similarly if 6, is accepted initially at the mth stage, then 


™ 


2. tt & 


2 
Co 
a=1 A 


log A + am. 


> 


2 2 
x log A + a.m > - log A + a,m 
it follows that 4, is rejected at the same or at an earlier stage, contradicting the 
assumption that the acceptance of 6, is an initial decision. Hence 6; and 6, cannot 
both be accepted. 

A geometrical representation of the rule FR is given in Figure 1. 

R can now be described as follows: Continue taking observations until an 
acceptance region (shaded area) is reached or both dashed lines are crossed. In 
the former case, stop and accept as shown above. In the latter case stop and 
accept H2. 

The proof above that @ and 6 cannot both be accepted consists of noting that 
a point below the acceptance line for #: is already below the rejection line for 
6, and that a point above the acceptance line for 4, is already above the rejection 
line for (; . 

If (A, B) ¥ (A, B), a necessary and sufficient condition for the impossibility 
of accepting 6; and & is that at n = 1 the following inequalities should hold. 


Rejection Number (of 6;) for Ry S Rejection Number (of 63) for R» 
and 
Acceptance Number (of @,) for R; S Acceptance Number (of 63) for Re. 


In symbols 


5 log A + a a 


9 


-log B+ aq 


These can be written as 


respectively, where d = a, — a). 








506 MILTON SOBEL AND ABRAHAM WALD 


1 ad . = : . 
Since —, > 0, the above inequalities are certainly fulfilled when 
Co 


(2.1) sé 1 1 *< 1 
“. B = anc Aa = i. 

In what follow in this paper, we shall restrict ourselves to cases where accept- 
ance of both 6, and 6 is impossible, even if this is not stated explicitly. 








FIGURE 1 


3. Derivation of OC Functions. Let L(H;|6, R) denote the probability of 
accepting H; when @ is the true mean and R is the sequential rule used. Let Ho; 
denote the hypothesis that @ = 6; . Since, as shown above, H, is accepted if and 
only if 6; is accepted, we have 


(3.1) L(H, | 6, R) = L(H8, | 6, R:). 
Similarly, 


(3.2) L(H;| 6, R) = L(Ha, | 0, Re). 





rn 





To, 
nd 





A SEQUENTIAL DECISION PROCEDURE 507 


From the fact that R, and R; each terminate at some finite stage with prob- 
ability one, it follows that R will terminate at some finite stage with probability 
one. Hence 


(3.3) L(H,| 0, R) = 1 — L(A | 6, R) — L(A; | 4, R). 
From pp. 50-52 of [1], the following equations are obtained. 
: At om 
(3.4) L(A, | 8, R) <== LAs, | 6, R,) = Au — Bu 
where 
\ —- 9 es 
ote 82S? *%—! 
62 — A 
2 
and 
: 5 A‘ —1 
(3.5) LU, | 6, Rs) = a" — BF 
where 
ings: —— 
ho = h(@) = a5 — ae 
0, — 0; A 
5) 


These equations involve an approximation, as explained in [1]. 
Hence 


ca pyhe 
(36) L(Hs|0,R) = L(H»,|0, R:) = 1 — L(Hy,|6, Bs) = 
— 


and 
ce ll eR a Ny AI See 
Am — Bu Ae Bi? ~ Am — Bu Ae — Be 
Since L(H,| 6, R) = L(Ho,| 6, Ri), it follows that L(Hi| 6, R) is a mono- 
tonically decreasing function of @ and that 
L(H,|—«, R) = 1; L(H,| ~,R) =0 
L(M,|@, R) = 1 — a; L(A, |, R) = B 


(3.7) L(H2|0,R) = 1 








aig 

log A + | log B|* 

Similarly, since L(H;| 6, R) = 1 — L(He, | 6, R:), it follows that Z(H; | 6, R) 

is a monotonically increasing function of @ and that 
L(H;| —«, R) = 0; L(H;| ©, R) 

L(H3| 63, R) = a; L(H3'6,R) =1—8 
llog B | 
log A + | log B|’ 


L(A, lai, R) = 


I 
_ 


L(H;| a2, R) = 





508 MILTON SOBEL AND ABRAHAM WALD 


Since L(H; | 6, R) = 1 — L(A, | 6, R) — L(A; | 6, R) it follows easily from the 
above results that 


L(H,|—»,R)=0; L(H2| o,R) =0 
L(H2\0,R) <a for 6<0; L(H.|0,R) <6 for @>% 





| log B | 4 | ___ | log B | 
log A + | log B| Oey Mae % log A + | log B, 

log A log A 
—__—_—_____. — 8 < L(H.|a@, Rk) < ————_——= 
log A + | log B| P a log A + | log Bi 





1—B —a& < L(A2|0,R) <1 for 6. 50 S Os. 


4 Probability of Correct Decision. Denote the probability of a correct 
decision by L(6/R). It is defined as follows: 


Interval Correct Decisions L(@|R ) 
6 <= 6 acceptance of A, L(A, | 6, R) 
6, < @< @ acceptance of H,; or H, L(H,| 6, R) + L(A2 | 6, R) 
6. 5 0S 6; acceptance of H, L(Hz | 6, R) 
63 < 0 < 6 acceptance of H, or H; L(H2| 0, R) + L(A;| 6, R) 
<0 acceptance of H; L(H3| 0, R) 


It should be noted that at points of discontinuity, L(6,| R) is defined as the 
smaller of the two limiting values. 
We shall now discuss some monotonicity properties of the function L(@| R). 
From the fact that L(Ho, | 6, Ri) and L(He, | 6, Rz) are continuous with con- 
tinuous first and second derivatives and are monotonically decreasing for all 
6 with a single point of inflection in the intervals 6; < 6 < @ and 6; < 6 < @% 
respectively, it follows that 
(i) L(@| R) is monotonically decreasing with negative curvature for -—« < 
6h. 

(ii) L(@| R) is monotonically increasing with negative curvature for #, < 
@< @, 
Making use of (3.3) we have further 

(iii) L(@| R) is monotonically decreasing with negative curvature for 6 < 


0< he. 
(iv) L(@| R) is monotonically increasing with negative curvature for 6; < 
6< 4. 
(v) For $6 <6, % L(@|R) = -|s L(H, | 6, R) + © L(H;| 6 R) | ws 
a _— — 


a 


decreasing, since a6 L(H, | 6, R) and db 


other words L(@ | R) has negative curvature for 6. S 0 S @;. 


L(H;| 6, R) are increasing. In 


che 


he 
R). 


all 
0, 


IA 


A SEQUENTIAL DECISION PROCEDURE 509 


In the special case when A = A = = = = and the origin is taken at 7 a ae 


for the sake of convenience, it is easy to see that L(6| R) is symmetric with 
respect to the origin and, because of (v), has a local maximum at 6 = 0. 

5. Choice of the constants A, B, A, B to insure prescribed Lower Bounds 
for L(@ | R). We shall deal here with the question of choosing A, B, A and B 
such that L(@| R) = 1 —y, when 6 < 6, L(0|R) 21 — - whin A<06< &, 
and L(@| R) = 1 — ys when @ = &. From the monotonic properties of the correct 
decision function it is only necessary to insure that 


(5.1) L(A, | R) =l1-— 11, L(0 | R) = L(63 | R) =l- 2 and L(64| R) =l]- 3. 
The following relations will be needed: 


hy(A1) = he(Os) = 1 = —hy(O2) = —he(O4) 


d A 
6, + 0, — 20 2 
ho(62) = ora = ~ = r (say) 
2 
A 
i+e~m. “TS. 
hy(@3) = eran ME sewcagee ME Half 
3 


where d = & — 6. = 0; —- 0, =Q—M. 
The following four equations are obtained from (5.1): 














(5.2) 1— L(A |, R) = L(He, |, Ri) = ; i = 
1 — L(H2 |, R) = L(Mi| 6, R) + (Hs |, R) 
(5.3) _ B(A - —B8 
A=) +44 A 
1 —_ L(A | 63, R) = L(A | 63, R) + L(A, | 3, R) 
(5.4) 1-8 + = (AT — »|- 
AB At — Br ” 
(5.5) 1 — L(H3|6, R) = L(Ao, | 41, Re) = ——- = 73. 





The “bracketed terms” represent quantities less than @ and 8 respectively and 
if r is sufficiently large they can be neglected. This will be made more precise 
but first let us note the results of neglecting the bracketed terms. 

From (5.2) and (5.3) we obtain 





(5.6) Bl — m1) = v2, whence B= 7 — : 
ac 1 





510 MILTON SOBEL AND ABRAHAM WALD 


From (5.2) and (5.6) 


= «el. BIL — 1) whence A = = =. 
1 Tt 


(5.7) A 


Since the last two equations are obtained from the first two by the permuta- 
tion A> A, B—> B, v1 ¥2 , ¥2 > ¥3 , We have 


: ¥3 
B is 
ia 2. 
¥2 
,; t+ 2 mee 
If v1 = yo = ¥3 = ¥ (say) then A = A = s* a" . 

We shall consider the bracketed quantities negligible if the result of neglecting 
them produces a change of less than 20% in [1 — L(6| R)] at 6 = 6, 03 re- 
spectively, i.e., if 


¥3 
(5.8) 





wan (dl 
(5.9) B (A = 1) _ 1 un) 


O57) ~( 


Inequality, (5.9) can be written as 


At — Br 


wd —- wy" - yl or 
(l-—~)'(1l—-m)'’ — vive 9 


(1 — y)" E co = (1 — n)'| = 
» 


This will certainly hold if 


< 7 


vw 


Assume that 7 , y2 and 73 are each less than 3. Then the last inequality can be 





A SEQUENTIAL DECISION PROCEDURE 


written as 


log (2) 
(5.10) SE congue 


log (? — *) 
Y2 


Starting with (5.8) the same relation is obtained except that ; is replaced by 
3 , namely 


5 
log — 
Y2 


(5.11) r= neers 
oe 


Let 


where 7 is the larger of 1 and 7; . Then k is the larger of the right hand members 
of (5.10) and (5.11). Then for (5.8) and (5.9) to hold it is sufficient that 


r & &. 

2 
1.3 

and 0 < y1,73 < .1 then k is approximately = = 1.35. 


If y2 = .05 and 0 <j, 73 < .1 then k is approximately = 1.54. If ye = .01 


We shall now investigate under what conditions the approximate solution 
obtained above for A, B, A, B are such that acceptance of both 6, and @, is 
impossible. 

It follows from (2.1) that the following pair of inequalities are sufficient for 


the impossibility of accepting both 6; and & : 


A nl-w * B wl-mn 
If y: + 73 let the smaller and larger of the pair (71 , ys) be denoted by Y and 
7 respectively. Since 1 — Y > 1 — 7, then 
y2(1 a 72) y2(1 - 2) 
“Al-) “nNi-F) 
and we need only consider one of the two inequalities in (5.12). The condition 
v2 < ¥ will in general satisfy (5.12). More precisely if all the y’s are restricted 
to the interval (0, .1) then 
“ 9 — 9 
10~1-Y 1-—-y~ 9 
and it is sufficient for the validity of (5.12) that ye S (.9) Y. 


(5.12) A wicRe, *, BI % x 








512 MILTON SOBEL AND ABRAHAM WALD 


If 71 = 3 = y (say) then the two inequalities reduce to one 


w—-wnty—-y 20 
which can be written as 


(v2 -¥v) (2 —-—1+ 7) 20. 


Since the inequality y, = 1 — vy is impossible when all y’s are <3, we see that 
v2 < 7 is sufficient for the validity of (5.12) when y; = y3 = y < 3. 

There remains the problem of finding an approximate solution for equations 
(5.2) to (5.5) when r < k. Since 





A A 
a — A = 
3 5 


we merely have to consider the interval 1 < r < k. 
The following approximations are used 

















1-B 1, BA-1) 2. 1-B 
A-B A’ A-—B ; At — Br Ar 
(5.13) ; ae faci 
eet; Tews; Bowe 
A-B A Ar — B A-B 
which upon substitution yield 
(5.14) Awd 
v2 
(5.15) B=; 
1 
(5.17 4+5 = 
ov. 7) A a = ¥ 
Subtraction of (5.17) from (5.16) shows that B = is a solution. Substituting 


this result back in (5.16) leads to the equation 
(5.18) B + B = 92. 
It can easily be verified that between zero and unity this equation has exactly 


one root. Since 1 S r < ~&, the root of the above equation lies between > and 


Y2- 
Taking y2 as a first approximation for B and substituting y2 + e¢ for B in 
(5.18), we obtain 


e+ (v2 + ©)’ = 0. 




























A SEQUENTIAL DECISION PROCEDURE 513 


Expanding (y2 + ¢)’ in a power series in ¢ and neglecting second and higher 
order terms, the above equation gives 


Tr 
v2 
1+ry" 


€e~™~ 


Thus, 
at - 


1 r "= rt 
“ey te doe 8. ee Oe) 
” 1+ ry2 1+ ry 
It is necessary to investigate under what conditions the above approximate 
solution satisfies (5.2) to (5.5) to within a 20% error in [1 — L(@/R)],i.e., such that 


ns 


(5.20) nm .nl—-B)_ en 
1 — 1B o 
yx(1 — B) 
1 — 3B 
_ BU =n) , B= w) 


1—w7yB i-ae” * 


Bl—y) , BQl— wv) _ 
1— ¥3B 1 — (7 B)’ 


where for B the value in (5.19) is understood. 

It can be shown that if y:, yz, y3, are each between zero and .1 then the 
inequalities (5.20) to (5.23) hold. Furthermore it can be shown that if, in addition 
v2 = min (71 , Y3) then also the inequalities (2.1) hold. The latter inequalities are 
sufficient to ensure the impossibility of accepting both 6; and 4. 


or 


(5.21) _ 


ols 


i — 3 < 


ole 


to 


(5.22) _- < 


orl 
o|z 


(5.23) -+< 


onl2 


rm < 


ul 


6. Bounds for the ASN Function. First we shall derive lower bounds for 
the ASN function. Let E(n/6@, R) denote the expected value of n when @ is the 
true mean and R is the sequential rule employed. For 6 < @; the probability of 
coming to a decision first with R, is large and therefore 


E(n/0, R) ~ E(n/6, Ry) 0 < 4. 
From the definition of R it follows that 
E(n/0, R) > E(n/@, R:) for all 0. 


Hence E(n/6, R:) serves as a close lower bound when @ < @:. 
otly Similarly 


and E(n/0, R) ~ E(n/6, Re) for 6 > 4; 
E(n/0, R) > E(n/6, Re) for all 6. 


3 in Hence E(n/@, Rz) serves as a close lower bound for 6 > 6;. 
Combining the above we have 


(6.1) E(n/0, R) > Max [E(n/0, R,) , E(n/@, R.)| 


ing 





514 MILTON SOBEL AND ABRAHAM WALD 


where, neglecting the excess over the boundary, 
L(H,/9, R;) log B os L(Hg,, 6, R;) log A 


(6.2) E(n 6, R;) = a 
a? (0 — a) 


L(H¢,/0, R2) log B + L(Ho,/6, Rz) log A 
= (6 — az) 
o2 


(6.3) E(n/6, R:) = 





Formula (6.1) gives a valid lower bound over the whole range of 6, but this 
lower bound will not be very close in the interval (@ , 63), particularly in the 


, . : -, 6+ 6 ~ ; 
neighbourhood of the mid-point —-,—. The authors were not able to find any 


simple method for obtaining a closer lower bound in this interval. The upper 
bound given later in this section will, however, be fairly close also in the interval 
(@2 , 63) and can be used as an approximation to the exact value. 

We shall now derive upper bounds for the ASN function. Let Rt be the follow- 
ing rule: “Continue to take observations until R, accepts 4.” Since this implies 
the rejection of 4, at the same or at a previous stage, it follows that R must 
terminate not later than R? . Hence 


(6.4) E(n/0, Rr) = E(n/®, R). 


As a matter of fact one can easily verify that E(n/@, RT) > E(n/6, R). Thus 
E(n/6, Rt) is an upper bound for E(n/6, R). This upper bound will be close 
when the probability of accepting 4, is high, i.e., for é@ < A. 


By the general formula 
BE = :.) 
E(n) = parcel 


E(z) 
(see p. 53 [1]) we obtain, upon neglecting the excess over the boundary, 


(6.5) E(n/6, RT) = __log B- , 


(6 — ay) 


3 
This coincides with (6.2) when L(H¢,/6, R:) = 0. 
Similarly, if R7 denotes the rule of continuing until R2 accepts & , then 
(6.6) E(n/6, R?) > E(n/®, R) 
(6.7) E(n/0, RE) = SA _ 
; (8 — ar) 


and this will be a close upper bound for 6 2 64. 


IfA=A= B = Rand if a; + a = Othe above results reduce to 





A SEQUENTIAL DECISION PROCEDURE 


ah 
x+0 
he 
Ss 


where the symbol 2 stands for a ciose inequality, and where 


(6.8) E(n/6, R) = E(n/0, RY) = 


(6.9) E(n/0, R) = E(n/6, RP) 


and A\ = @= —4a,. 


To establish an upper bound for the ASN function in the interval @ < @ < 6; 

a 1 1 
we shall restrict ourself to the case where A = A = 7 Bp: these relations are 
fulfilled by the approximate values of A, B, A, B suggested in section 5 when 


- a a @ . 
v1 = Y2 = y3; and r 2 k. We shall choose the origin to be at = , 1.€., We put 
ay + ao 7 5 . 5 >\: ‘. ° 
—= 0. Then the vertex P of the triangle (P; , P2, P) in diagram 1 lies on 


' - i . 
the abscissa axis and OP; = OP, = h. The abscissa of the vertex P is —— N (say) 
. 


where \ = @2 = —aq,. Let y = p X; represent the sum of the first N observa- 
i=l 

tions. Let R2; denote the rule: ‘Continue until both 6 and 6; are accepted”’. 
This is tantamount to neglecting the two outer lines in diagram 1, i.e., the accept- 
ance lines for 6 and 6,;. Then clearly, 
(6.10) E(n/@, R23) > E(n/6, R). 
When @ lies between 4: and 6; this inequality will be close, since the probability of 
crossing either of the two outer lines is then small. 

However E(n/@, R23) was found difficult to compute and it was necessary to 


N 
. . , ry r . - 
consider instead the rule R»; : ‘Take N observations. If y = > X; < 0 then 
i=l 


continue until @ is accepted. If y > 0 then continue until 4; is accepted”’.° Clearly, 
(6.11) E(n/0, Rx) > E(n/6, Rss). 


This inequality, however, will be close only if the probability of concluding the 
test before N observations, given that @. < @ < 63, is small. 

Some investigations by the authors seem to indicate that the inequality (6.11) 
will be close when A < X. This inequality is likely to be fulfilled in practical 
problems. 

We shall now proceed to determine the value of E(n/8@, Rs). N eglecting the 
excess over the boundary, we have 


N 
(6.12) E (n/e, Ris, 2, ti = v) ~~ for y>QO 


i=] 


3 The event y = 0 has probability zero and it is indifferent what rule is adopted for that 
case. 











MILTON SOBEL AND ABRAHAM WALD 


and 





N 
(6.13) E (nye, Re. x = v) = . ~~. : ‘ for y <0 
where, for any condition C, E(n/6, R, C) denotes the conditional expected value 
of n given that the true mean is 6, that R is the sequential rule used and that the 
condition C is fulfilled. 
Multiplying with the density of y and then integrating with respect to y, we 
obtain after simplification 


F ' 1 0 h hrX _—(462/2r02) 
2 S=S—— 2 _ = — 
(6.14) E(n; 6, R23) Y— @ | + ho € /*) + 2¢ Vs é | 


z e—(u?/2) 


where ¢(z) = I vi dy, and 62 < 6 < 63. 


In particular, for 6 = 0 we get 


(6.15) E(n/o = 0, Ra) = 2 + 54/2, 
r \* T 

To establish a close upper bound for 6; < @ < 6, we must bring the line of 
acceptance of @, into account. The line of acceptance of 6 can be disregarded 
since the probability of accepting @, is very small. 

We therefore define the rule Ry as follows: 

“Continue with R,; until @ is accepted and with R» until either 63; or 4,4 is 
accepted.” 

Since the ASN function for Ry is difficult to compute we define a modified 
rule Ry as follows: 

“Proceed to take N ( = *) observations without regard to any rule. If y = 
i 


> X; < 0 then continue only with R, until @ is accepted. If 0 < y < 2h then 


i=] 
continue only with R, until either @; or 4, is accepted. If y = 2h then stop taking 
observations and accept H;.” 

It is clear that the following inequalities hold 


(6.16) E(n/0, Ru) > E(n/0, Ru) > E(n/6, R). 


The proximity of E(n/@, Rs) and E(n/@, R), as stated above, is based on the 
fact that the probability of accepting 6, , when 0; < 6 < 44, is small. 

The proximity of E(n/6, Ry) and E(n/8, Rzs) is assured if the probability 
of terminating with Ry (and with R) before N observations is small. It can be 
shown that the latter condition is fulfilled when A < X. In terms of the quan- 
tity r defined in Section 5 this can be written as r > 3. 

To determine the value of E(n/@, Ry) the following two preliminary results 
will be needed: 


ralue 
t the 


, We 


the 
lity 
1 be 


1lan- 


ults 


A SEQUENTIAL DECISION PROCEDURE 517 


If 0 < y < 2h, 
a ame 


1 
ah — y — 2h) 1 Se 


” ‘ i< h 
(6.17) B(n/6, Ri, 2 Xi = y= 2+ oe 





= C (say). 
Ify < 0, 


(6.18) E (n/a, RD x= v) = : — — D (say). 


Both are easily obtained from formula (7.25) on p. 123 of [1]. 
Multiplying with the density of y and integrating with respect to y, we obtain 
after simplification 


ea 2— 06, /h 0 /h 
E(n/0, Ris) = 5 +[o( y/") 6(24/%)| 

h 9 DoW AO) 10?) 
— - ae 
a= (: l+e 


o AN 5 ~(n02/2d02) —h (24-8) 2/2he2 
eegeronmnmnonte —e€ 
+ MA — 8) Qn le 


ho , 6 rv eo (n0? ho? 
- oem |} -20(2 4/8) |+ av? } : 


Formula (6.19) is an improvement on (6.14) as it will give for any 6 a smaller 
upper bound, but in the neighborhood of the origin the difference is insignificant. 
For 6 = \ we obtain from (6.19) using L’Hopital’s rule 


9 
« 








(6.19) 


E(n/d, Ru) = = — z.. (4hX — 30°) 
(6.20) - 40" 
{i ~ % ( V my + (> + =) hr eo OhNI208) 
~ o 2072 6 On 
2 Ze > 2.5, the above formula can be approximated by 
(6.21) E(n/d, Ru) ~ s ‘eae cr. 


» 2r 
Since the right hand member above lies between : and (1 002) * = | when 
vin > 2.5 then for practical purposes 
Co 


2 / 
(6.22) E(n/x, Ra) ~t when (ve > 2.5). 














518 MILTON SOBEL AND ABRAHAM WALD 


An upper bound for E(n/é, R) for 6 < @ < @, can be obtained by defining 
Ry and Rivi in an analagous way to Ry and Rs. Because of reasons of symmetry, 
E(n/6, Riz) can be obtained from (6.19) by replacing @ by —8@. 

The method used for obtaining upper bounds for E(n/6, R) can easily be 

1 


os . 1 
extended to the more general case when the equalities A = A = B"% do not 


necessarily hold. However, the resulting formulas are more cumbersome and we 
shall merely give without proof the upper bound corresponding to (6.14). This 
upper bound becomes 


E(n/0, Rs) = N + & — aE ~ (a) | + (* + aE - 6(6) | 


N ewe —b2/2 
' LV , e 
VOW oF ; =3 Tis | 








bo 

















where 
hy = x log A hi = - log B 
hy = : log A ha = x log B 
a= —a=rX 
ar _ Ar — hoo , hs — Né ; _ b+ NO _— hu + hoo 
——— lO eS ee 
7. An Example. We shall consider the following example 
o = 1,6 = —%, 0 = —t%, Os = te, A = 3. = 2 = 1s = 7 = 029 
then 
ites we w £m =7>3>k~147 
erate : = oO r= >Kw~ lal 
and 
h= Slog d = 28,r4= 2% 34 - 6. — 0, = 0, — 03 = §. 


Using formulas (6.1) and (6.7) the following upper and lower bounds were ob- 
tained 








‘ 5 | 6 | 7} 8 | 9 | 10 | 12} 14 | 16 | 18 | 20 

16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 

2 ae areas erin teens atari mneemes nae smemeoreitihs seers 
Upper bound........... 448| 224, 149 112.89.6'74.7| 56 44, 8137.3) 32 | 28 





pec bes dene beach pees metteoe 
Ltiid.......... 421 224 149 11289.674.7, 56 |44.8)37.3) 32 | 28 











AN 


A SEQUENTIAL DECISION PROCEDURE 519 


Formulas (6.14) and (6.1) yield 





1 2 3 
6 | 0 16 | i6 16 
Upper Bound......... | 146, 163 229 450 


Lower Bound.......... | 112 149 224; 421 


In the neighborhood of the origin the true value is very nearly the upper bound. 
From formulas (6.19), (6.22) and (6.1) we obtain 














6 (ai £12 

| 16 | 16 | 16 

Upper Bound.......... | 422'784.5) 423 
| 

Lower Bound.......... 421/784 | 421 


As shown above for the end points of the indifference zone, (6.19) gives better 
results than (6.14) or (6.7). This is as it should be since (6.19) takes into account 
possibilities omitted in (6.14) and (6.7). The greater accuracy of (6.19) is offset 
by a slight increase in computation. 

In the graph of the Bounds of the ASN function shown in Figure 2, a single 
curve is shown wherever the upper and lower bound are sufficiently close to 
each other. 

Since (6.14) contains an even function of 6 and since elsewhere the correspond- 
ing bounds are mirror images with respect to 6 = 0, the bounds for negative 6 
are exactly the same as those for the corresponding positive @. 

Consider the following non-sequential rule applied to our problem. With a 
fixed number No of observations compute the mean & and accept H, if Z falls in 
the interval (— ©, a;), accept H2 if Z falls in [a; , a2] and accept H; if Z falls in 
(a;, ©). This is certainly a reasonable procedure. One can also verify that no 
other non-sequential rule exists that is uniformly better (for all possible values of 
6) than the one under consideration. 

The two decision procedures become comparable if we introduce the indiffer- 
ence zones and define a wrong decision in the non-sequential case exactly as was 
done for our sequential procedure (see Section 1). 

For the non-sequential case (just as in the sequential case) the probability of a 
wrong decision will be discontinuous at 6, , 62 , 6; and @,. At each of these points 
there will be a left-sided and right-sided limit, different from each other. As in the 
sequential case we shall take the probability of a wrong decision at a discontinuity 
point to be equal to the larger of the left and right hand limits. One can easily 
verify that the maximum probability of a wrong decision occurs at @ = @; (which 
is equal to the value at 6 = 4). 








520 





MILTON SOBEL AND ABRAHAM WALD 





We then determine N» by setting the maximum probability of a wrong decision 
equal to ¥, i.e. 

» i — A/2 A - 

(7.1) 6(4— — A/2 VN.) + (3 VW.) =l1-—y¥. 
UPPER AND LOWER BOUNDS FOR THE ASN FUNCTION 
il EEE EE 
a WT yt 


























. EEE EEE 

wo FCCC LPT LLL 
wh 
“tt COO COAL 
. BEE EE IE EE 
8 EEE CECE EEE 
“TTL UEERALLALLLLLLT 
TTT iv we 
TC ” SO SS Ea 
aa | CCOEnS= 


Mt ats eo oh wk OF UR CUB OE iiss ee 
i616 i6 16 16 16 16 














FIGuRE 2 


For the particular problem considered above, this gives No = 915.4. Hence 
916 observations are required in order to ensure that this non-sequential pro- 
cedure will have the maximum probability y = .029 of a wrong decision. This 
is to be compared with the maximum over all @ of the ASN function in the 
sequential procedure, which was 784.5. 

Returning to (7.1) we shall derive lower and upper bounds for the root of that 
equation. Since 









> fae ye 2 2 wl, 
o 20 










A SEQUENTIAL DECISION PROCEDURE 


it is clear that the root of the equation 
‘+. / 
(2 VN.) + o(3.- vm) = — | 
so o 
is an upper bound for the root of (7.1) and that the root of the equation 


o(=) + o(2-v™) = 1-7 


bey. 


is a lower bound for the root of (7.1). We shall compare the value of z = ov No 
o 


with the value of y = > ~/ Max ASN. Since 
~ 6 


9 2 
o 


« . — - 
Max (ASN function) ~ - 5 (toe t=) (for sufficiently small S ). 
6 o A "7 d 


then 
Sl (for sufficiently small : ). 


aeetieiaes 1 
y= 4 ~/Max ASN ~ = log 
20 Q 2 


The following table gives upper and lower bounds for x and the corresponding 
» & 1 
value of y for the type of example under consideration, i.e., when A = A = B7B 


and r 2 k. 

005 os | o | 45 A 

| —--Qxrr 
1.28-1.65 


002 





001 


| 
| 


7 } 
zandZ |3.08-3.31] 2.87-3.10 


As the table shows‘ for .1 > y > .008 
zori>y 


4 Actually, the inequality in question is shown only for the values of y given in the 
table. However it can be verified that the inequality remains valid for all values of y be- 


tween .1 and .008. 








522 MILTON SOBEL AND ABRAHAM WALD 


and hence 

A 
q> 
The statement and the table above are not meant to delimit the region in which 
the sequential rule is superior to the non-sequential procedure. 


No > Max ASN (for sufficiently small 
8 


REFERENCES 


[1] ABRAHAM WALD, Sequential Analysis, John Wiley and Sons, 1947. 
[2] P. Anm1TaGE, ‘‘Some sequential tests of student’s hypothesis,’’ Supplement to the Journal 
of the Royal Statistical Society, Vol. 9 (1947) No. 2, p. 250. 








ul 


MOMENTS OF RANDOM GROUP SIZE DISTRIBUTIONS’ 


By JoHn W. TuKEY 
Princeton University 
















1. Summary. A number of practical problems involve the solution of a mathe- 
matical problem of the class described in the classical language of probability 
theory as follows: “A number of balls are independently distributed among a 
number of boxes, how many boxes contain no balls, 1 ball, 2 balls, 3 balls, and 
so on.”’ Problems arising in the oxidation of rubber and the genetics of bacteria 
are discussed as applications. ; 

A method is given of solving problems of this sort when “how many” is 
adequately answered by the calculation of means, variances, covariances, third 
moments, etc. The method is applied to a number of the simplest cases, where 
the number of balls is fixed, binomially distributed or Poisson and where the 
“sizes” of the boxes are equal or unequal. 

























2. Introduction. The distribution of the number of empty boxes has been 
investigated by Romanovsky in 1934 [8], and, apparently independently, by 
Stevens in 1937 [4]. Romanovsky investigated the case of N equal boxes and 
m balls for (i) the case where the balls are independent, and (ii) the case where 
there is a limit to the size of each box. He gives no motivation for the problem, 
and shows that certain limiting distributions approach normality. Stevens 
investigated the case of m independent balls for N boxes (i) of equal size, and 
(ii) of unequal size, and developed a useful approximation for the last case. 
Stevens was concerned with this problem in order to test box counts for non- 
randomness by comparing the number of empty boxes with expectation. The 
reader interested in that problem is referred to his paper. 

The results derived in Part II are based on the use of a chance generating 
function, a technique which applies easily to the case where the balls are inde- 
pendent. Thus Romanovsky’s results for the case of boxes of limited size are 
neither included or extended. For the other cases where the number of empty 
boxes has been considered, the results below seem to provide simple moments 
and cross-moments for the numbers of boxes with any number of balls to the 
extent previously available for the number of empty boxes. Both Romanovsky 
and Stevens investigated the actual distribution of the number of empty boxes. A 
similar investigation of the distribution of the number of b-ball boxes has not 
been carried out here. 







3. A chemical problem. In studying the oxidation of rubber, Tobolsky and 
coworkers were led to propose the following problem: “If a mass of rubber 
originally consisted of N chains of equal length, if each chain can be broken at a 








1 Prepared in connection with research sponsored by the Office of Naval Research. 


523 


524 JOHN W. TUKEY 


large number of places by the reaction with one oxygen molecule, if there are m 
oxygen molecules each equally likely to react at each link, and if mN’p molecules 
have reacted, what is the probable number of original chains which are now in 
b + 1 parts as a result of b oxygen molecules having reacted with b of their links? 
Here an original chain plays the role of a box and an oxygen molecule the 
role of a ball. The sort of numbers which may be taken as characteristic are: 


N = 10" (number of chains), 
m = 10'°°to 10” (number of oxygen molecules), 
mp = 0.01 to 100 (average breaks/chain). 


Thus it.is almost certainly going to be appropriate to use the results obtained by 
assuming N and m very large and p = 1/N very small. We shall return to this 
example after discussing the general results. 


4. A bacteriological problem. The experiments of Newcombe [1] on the 
irradiation and mutation of bacteria have prompted Pittendrigh to propose the 
following problem: “Suppose a large number of bacteria each contain m enzyme 
particles, which have been formed by the action of a nuclear gene. Suppose 
that irradiation destroys the nuclear gene in a certain fraction of the bacteria. 
Suppose three generations to occur, during which the m original enzyme particles 
are randomly distributed among the 8 descendants of an original bacterium. 
If a bacterium without either nuclear gene or enzyme particle is a recognizable 
mutant, what is the expected distribution of ‘families’ with 0, 1, 2, 3, ---,8 
mutants?” 

Here the enzyme particles are the balls, and the 8 descendants are the NV 
boxes. We are interested in the number of empty boxes—the problem is that 
discussed by both Romanovsky and Stevens, with the exception of an allowance 
for cases where the nuclear gene was not lost. We shall return to this problem 
also after discussing the general results. 


5. The case of large numbers. In case the number of “balls” and ‘“‘boxes”’ is 
large, it is natural and has been customary in similar problems to replace discrete 
variables by continuous, and derive differential equations. The process runs as 
follows: Let yo, 41, Y2,°°* » Yo, °°* be the fractions of the total number of boxes 
containing no, one, two, --- , b, *-- balls. Let ¢ be the average number of balls 
per box (artificially made continuous, so that we may, for example, have a total 
of 13 + 37 balls). Increase ¢ to ¢ + dt, then of the yo boxes previously containing 
no balls, yo dt will receive one. Of the y; boxes previously containing one ball each, 
y; dt will receive a second, and so on. Hence 


dyo 
dt 


d 
> i ee 


= —W, 










RANDOM GROUP SIZE DISTRIBUTIONS 






dy _ 
dt = Yo-1 Yo, 










and if we start, when ¢ = 0, with yo = 1, and y, = O for b > 0, we find 














b 
t +s 
e 


(1) Yo = | ? 


b = 0,1,2, +: 


~~ 


The usefulness of this result has sometimes been in doubt, thus Opatowski 
[2, p. 164] says in a similar connection: ‘‘Consequently --- the theory appears 
less accurate for small values of ¢.”’ 

It is shown in Part II that; where n, boxes out of the total of N contain 
exactly b balls: (I) When the number of balls and bozes is large and fixed, (1) 
is a good approximation to the expectation of n,/N. (II) When the total number 
of balls has a Poisson distribution, and ¢ is interpreted as the expected number, 
(1) reproduces the expectation exactly. Since it is appropriate in most problems 
involving chemical reactions or irradiation to take the number of balls as having a 










TABLE 1 
A fixed or binomial number of balls and equal boxes 

HYPOTHESIS | 

| A total of m balls are independently distributed into N boxes or elsewhere, the | 


chance of a particular ball entering a particular box is p. The number of boxes 
_each containing exactly b balls is no . 


| b 
| Mean of m = E(m) = N (7) (1 — p)” (2) 













Variance of n» = E(ns)(1 — (1 — (0, b))E(ns)) 
Covariance of n, and n, = —(1 — &(b, c))E(ns)E(ne) 
~ (1-9) Sar 1 - G23) SS) 
a (1 - 5) ae es JA rs - 
| where m” = m(m — 1) --- (m—b +1) involving b factors 


Higher moments See Section 14 








Mean of m = N(1 — p)” 

Mean of n,» = Nm(1 — p)” (2. 

Variance of mn = N(1 — p)” — N°(1 — p)™ + N(N — 1)(1 — 2p)” 

Variance of nm, = N(N — 1)m(m — 1)(1 — 2p)” "*p’? + Nm(1 — p)” “p 
er N’m'(1 = pp 

Covariance of no and n, = N(N — 1)m(1 — 2p)”""p — n’m(1 — p)*” "p 








526 JOHN W. TUKEY 





Poisson distribution, the caution suggested by (I) is often shown unnecessary 
by (II). For this type of problem the differential equation is entirely adequate! 

It is further shown in Part II that, in the Poisson case, the second moments 
are exactly those which correspond to random sampling from an infinite popula- 
tion with the fractions indicated by the mean number of boxes with 0, 1, 2, --- , 
b, --+ balls. This result is not accidental, and it is shown in Part III how we can 
see directly that the whole distribution in this case is that of random sampling 
from such a population. 





6. The case of small numbers. The results of Part II also allow us to state 
the means, variances, and covariances, for the cases where the differential 
equations do not apply. The results are set forth in the following tables: Tables 1 
and 2 apply to the cases where m balls are distributed among the given boxes 
and possibly others. Thus the total number of balls in the given boxes is either 
fixed, when there are no other boxes, or follows a binomial distribution. 


TABLE 2 
A fixed or binomial number of balls and unequal boxes 














HYPOTHESIS 
A total of m balls are independently distributed into N boxes or elsewhere, the 
chance of a particular ball entering the 7th box being p;. The average of the 
pi = p. The sum of the squared fractional deviations of p; from p is A. 

pi = p(1+ X,), 2;\; = A. Terms in DAj, TAj, etc. are to be neglected. The 
number of boxes each containing exactly 6 balls is n. 








Mean of m = E(m) = N Ee} (1 — p)”’p’ times 


(C + sites op) ((mp — b)” — (m — b)p — b(1 — p) )| 


Variances and covariances as in Table 1, using 


| a 1 (m oi c)” _ p yy (; - = ( 4 


where y = 2bc (2p — x) + terms in p’ and in t 





The exact value of W is given in Section 16. 


. me Ap’ m(m o~ i 
Mes o= NI - + —; m 
Mean of n (1 p) (1 oN (i — p? ) 














| ———— —_ , A(m — 1)p(1 — mp) 
Mean of nm, = Nm(1 — p)” pp (1 _ NG = pe 
























RANDOM GROUP SIZE DISTRIBUTIONS 


TABLE 3 


Poisson balls and equal boxes 
sssiksniiieiebitniaatunpieinnaiiigtanamnceaie 


HYPOTHESIS | 
A number of balls with the Poisson distribution, and expectation Né are | 
independently placed in N boxes. The number of boxes each containing | 
| exactly.b balls’is np. | 





b 
| Mean of m = E(m) = N in e* 


| b 
| Variance of n» = N ( *) ¢ — — 


| Covariance of nm and n, = —N ( 








Mean of no Ne“, 

Mean of n; Nte“‘, 

Variance of nm) = Ne ‘(1 — e'), 
Variance of ny = Nte ‘(1 — te‘), 
Covariance of no and ny = —Nte*. 








7. Discussion of the chemical problem. The number of oxygen molecules 
which have reacted in a given time is, at best, distributed Poisson. Thus the 
differential equations would give the expected number of cuts, even if the 
number of balls or boxes were not large. 

The fact that the numbers of balls and boxes, are large makes the variances 
and covariances so small as to be practically unimportant. Thus, for example, 
with N = 10°, ¢ = 1 (1 break per chain), we have: 


mean of : x 10°, 


18 
mean of 7; 10°, 


variance of ') x 10°, 


variance of n, = :) x 10°, 


y , 1 18 
covariance of mm and n, ——xX10. 
e 


Thus the standard deviations are less than 1 part in 100 million of the mean. 





JOHN W. TUKEY 


TABLE 4 
Poisson balls and varied boxes 


HYPOTHESIS 
A number of balls with the Poisson distribution are independently placed in N | 
unequal boxes. The expected number placed in the 7th box is ¢; . The average of 
the ¢; is t, t; = t(1 + ,) and A; = A. Terms in Az, ZA}, etc. are to be 
neglected. The number of boxes each containing exactly b balls is n,. 


b 


Mean of nm = E(m) = Ne e' (1 + se ((b — t)? — )) 


! 


Variance of n» = E(m) — ¥ (1 + _ (b — (E(m))") 


| Covariance of m and n, = -* (1 - st ((b — t)(e — t))E(m)E(n,) 


| 





| Mean of nm 


Meanofnm = 1 + =... 


A(? — ~ 
2N 


iaret At ssi 
Variance of 1 Ne . ~ Ne~ (1 


7. 9 
Variance of n, = Nte a) 
+2, ~2t A(30 — Gt v) 
Nte (: + a 


ON 


A(3t? — =) 


‘ j 742 —2t 
Covariance of m and n; = —Nte (1 4+. 





8. Discussion of the bacteriological example. Although this example came 
from an irradiation experiment, we are not entitled to jump to the Poisson 
case. The balls are not actions of radiation, but rather previously existing 
enzyme particles. The purpose of the radiation is merely to make a failure to 
hand down a particle obvious. 

For simplicity, let us begin by assuming that the irradiation has been strong 
enough to knock out all the nuclear genes and none of the enzyme particles. We 
face the following problem: “If the m enzyme particles are divided by chance 
among 8 descendants, what should be the distribution of mutants, that is, of 
boxes with no balls?” 

As far as mean and variance, we can answer this question from Table 1, 
with N = 8 and p = 3. 





RANDOM GROUP SIZE DISTRIBUTIONS 

The results are 
mean number of mutants = E(no) = 8(3)”, 
vans = 2(2Z)" -4(7)2™ rR(s\™ 
variance of same = 8(%)" — 64(%)°"” + 56($)”. 


For small values of m we get the values tabled below: 


TABLE 5 
Blanks out of 8 








variance 


8 | 0.000 
7 | 0.000 
6.125 | 109 
5.359 .262 
4.689 | AI7 
4.103 556 
3.590 | .666 
3.142 | 747 
2.749 | .799 
2.405 | 825 
2.105 | .829 
1.079 | .663 
0.554 426 





\ 
We notice that the variance is substantially less than the mean. 

Now it might be that the number of enzyme particles is not constant from 
bacterium to bacterium. It would not be unreasonable if it had a Poisson dis- 
tribution. If this were the case, we would revert to the differential equation 
solution, which is also given in Table 3. The last column in Table 5 shows the 
variance which would then arise for the same means. The variance is still some- 
what less than the mean. The situation is shown graphically in Figure 1. 

If the actual distribution of 7» is desired, then it can be calculated for the 
case where m is fixed from the tables in Stevens’ paper [4], and when m is 
distributed Poisson it is merely a binomial distribution. 


PART II 


DERIVATIONS 


9. The chance generating function. We are considering the following class of 
problems: ‘“‘balls” are placed independently in “boxes” and then the number 7 
of empty compartments, the number nm of compartments containing exactly 








530 JOHN W. TUKEY 


one ball, - -- , the number n, of boxes with exactly b balls, and so on, are observed. 
We are interested in the moments of no, m, m2, ***, 2, *:: both simple 
and mixed. 


RATIO OF VARIANCE TO MEAN 
FOR NUMBER OF EMPTY BOXES OUT OF EIGHT 


1.0 


FIXED NUMBER OF BALLS,AS INDICATED @ 


POISSON DISTRIBUTION OF NUMBER OF BALLS = 


0.8 


0.6 


VARIANCE 
MEAN 


0.2 





AVERAGE NUMBER OF BLANKS 


Figure 1 
We define chance quantities x;, by 


x, qth ball in the zth box, 
Liq = 
' 1, otherwise. 


Clearly the product of all x;, for fixed 7 is given by 


oo (number of balls in the ith box) 
Il.viq = 2X 


Thus [I,2;, = 2° if and only if there are exactly b balls in the 7th box. Hence 
the coefficient of x in 2,IIxci,, the sum of Igrig over all boxes 7, is nm, the 
number of boxes containing exactly b balls. 


RANDOM GROUP SIZE DISTRIBUTIONS 


We have the relation 
Sonex’ = f(x) = TA wiz, 


where f(x) is a chance function, and the n, and the x: are chance quantities. 
Now we take expectations of both sides, and use the fact that the expectation 
of a sum jis the sum of the expectations to obtain 


. bry 7 \ +p 
Vo E(no) = E(f(x)) = DE (grin). 
Now wig and 2;,, for g # r, are independent since they are determined by 


different and independent balls. Hence E(II,v;,) = TI,(Exiqg) and we have the 
basic formula 


(1) E(f(x)) = Sex’ E(ne) = TAI gE (aig). 
10. Higher moments. By extending this device, we can obtain generating 


functions for higher moments. Instead of the x;, , we introduce a whole sequence 
of chance quantities 2 iq, Yiq, Ziq, *** » Wig, defined by 


(x, y, «++, w), gth ball in ith box, 


(Lig, Yias "** 5 Wig) a : 
\(1, 1, ---, 1), otherwise. 


We find immediately that 
flw)fly) +++ fw) = (SAlgvig)(ZsALysx) +++ (ZaLlp tap) 
Didj°++ Vall eLighig °° Wag - 
Taking expectations on both sides 
E(f(x)f(y) «++ f(w)) = Did;5 +++ TaH(Merigh jg +++ Wng) 
Didj++ Tall H (Tigh ig +++ Wna)s 
where we have used the fact that rigy jg +++ Wng ANG LisY jr ++ * War are Independent 


when q ¥ r since they are determined by different and independent balls. 
On the other hand, 


f(x)fly) «++ f(w) = (Zonex’)(Deney’) +++ (Tanaw") 
Dele +++ Ta(nete --+ me)(xy’ >> w*) 
so that 
E(f(x)f(y) «++ f(w)) = Sere +: DA(2°y° -++ w)E(nyne «++ Na). 
Equating the two expressions for the expectation of f(x)f(y) --- f(w), we have, 
finally, the generating function for E(nsne +++ n,) in the form 


(2) >» (a°y® =o w )E(ng ne eo Na) = ) TE (ig Vig paar Wnq)- 
A ee 


b,c,++*,a 


Thus a knowledge of E(x ig jg +++ Wnq) Will allow us to determine the moments 
of the n’s. 








532 JOHN W. TUKEY 


11. A fixed or binomial number of balls and equal boxes. Let there be N 
boxes, and m balls, each with probability p of entering each box. If pN = 1 we 
have the case where m balls always appear in the boxes taken together—the 
case of a fixed number of balls. If pN < 1, the number of balls appearing in all 
boxes taken together is a binomial with expectation mpN. 

Now 2iq equals 1 with probability 1 — p and equals x with probability p, 
hence (1) becomes 


Lex’ E(ns) = DiM,(1 — p + pr) = N(1 — p+ px)”. 


‘ ‘ ‘ . be 
Using the binomial theorem, the coefficient of x’ is 


(3) E(m) = N (7) (1 — p)"*p = N (7) (1 — p)” (2. ' 


Now if p is small, we may approximate 1 — p by e ” 


in its two occurences, where 


and by 1, respectively, 


E(m) & v(™ erm p 
» 


and if m is large compared to b this becomes 


. el uw 
E(m) ~ N bl é . 


12. Second moments. We must study E(ai,yj;,.). If i = j then this is 
(1 — p + pry) since the gth ball falls into both the ith and jth boxes with proba- 
bility p, otherwise into neither. If 7 + j, we immediately find the expectation 
to be (1 — 2p + pa + py). 

Hence, since 7 = 7 in N cases, andi ¥ 7 in N(N — 1) cases, 


Zi E(eiay ig) = NUL — p + pry)” + N(N — 1)(1 — 2p + px + py)", 
by (2) this equals >,.c°y°E(nin-), and using the multinomial expansion we find 


E(nsne) = N(N — 1) (;”) (1 — 2p)""“p” + &(b, c)N Ci (1 — pp’, 


where 5(b, c) = 1 when b = ¢ and is zero otherwise, and where the multinomial 


coefficient (7") is given by 


m\ _ m! 
be) ble'(m — b — c)! 
We now set 


(4) E(nyne) = E(ns)E(n-)®(b, c) + 5(b, c)E(ns), 


we 
che 
all 


P, 


is 
a- 
on 


nd 


ial 


RANDOM GROUP SIZE DISTRIBUTIONS 533 


b-+e 
N(N os 1) (7" Ja os 2p)™ es 
$(b,c) = eens ht ceieitnsinaanea A TI ct 


v(i)a- a (e285) »(")0-(r25) 
(5) 1 te 1—2p \"/1-—p\* 
-(1-y) (m) (m) (a=na tn) (is) 


b c 


: (1 “ x) (m — ¢)" (1 i‘ (2) = 4% 
N m®) l—p 1 — 2p 


where u” = u(u — 1) --+ (u — b +1) denotes a descending factorial with 
factors. 

Notice that, if the n, were independently distributed in Poisson distributions, 
the second moments would be given by the same formula with ®(b, c) = 1, 
while if they were distributed like a multinomial sample from an infinite popula- 


tion the second moments would be given by the same formula with ®(b, c) =1 — ¥ 


when 








For small p, we have 


&(b,c) & (1 _ x) (m= 0)" 


A m) 


and if m is large compared to b and c, this approaches the multinomial value 


(b,c) (1 - x): 


13. Variances and covariances. The variances and covariances are given by 
Variance (ny) = E(nens) — E(ns)E (ns) 
E(n)(1 —- (1 — &(b, b))E(ns)), 


and 
Covariance (7, , %-) = —(1 — &(b, c))E (nm) E(n-). 
Thus the covariance of n, and n, will vanish when, and only when #(b, c) = 1. 
Let us suppose pN = - with p small and m and N large, and see if @(b, c) 


can be unity. Since a preliminary calculation shows it to be reasonable, let us 
put m = yN. Then 


(yN — c)” 


— (yN)® (1 — p)"(1 + py. 


#(b, c) = (1 — Bp) 








534 JOHN W. TUKEY 
An easy calculation shows that the ratio of descending factorials is nearly 


eo tel yy m 6 teeta 


making further natural approximations, 


In (b,c) Y — Bp - ~~ p — yNp’ + (b+ ¢)p 
Y 

















and this may be written 


In 6(b, c) we —2P 2% ~b—c+8 + 46c — (b — B — c)*), 
4y B 

and this vanishes for real y when and only when |b — 6 — c| > ~/46c. This, 

then, is the condition on b and c which permits the existence of two ratios of m 

to N so that for either ratio and large N there will be no correlation between 

ny and n.. 


14. Higher moments. To deal with the third moments, we need E(x iq¥ jez), 
which is easily seen to behave as follows: 






; | | 
Relation of ijk number of occurrences 




















| Expectation of Xiq Yjqzkq 
t=j=k N | L—pt pryz 
L=jJ HF N(N — 1) | 1—2p+ pry + pz 
t=kFAj N(N — 1) 1 — 2p + pxz + py 
j=ak Hi N(N — 1) 1 — 2p + pyz + pe 
different N(N — 1)(N — 2) 1 


— dsp + pxrt+ py + pz 





Thus we have 


Tica 2 E(nyenena) = NO — p + pryz)" + N(N — 1)(1 — 2p + pry + pz)” 
+ N(N — 1)(1 — 2p + pez + py)” + N(N — 1)(1 — 2p + pyz + px)” 







+ N(N — 1)\(N — 2)(1 — 3p + px + py + pz)” 





from which we can calculate all third moments. 









In general if € is a decomposition of the product xyz --- w into a monomials 
My, U2, *** , Ua, Where order is disregarded (for example: xyz = (xz)y = (zx)y = 
y(zx) = y(xz) is a single decomposition with a = 2, wm = xz, uw = y), then the 


generating function becomes 
SNOO(L + (uy + 2 + +++ + Ua — @)p)”. 


15. Poisson balls and equal boxes. To reach a Poisson distribution we let 
m— © and p—0so that mNp = tN, where t is the average number of balls 
per box in the Poisson distribution. 








RANDOM GROUP SIZE DISTRIBUTIONS 


Since 


under these conditions, (3) becomes 


t 


(6) E(m) = N bi ° 


atid ae i L 
and from (5) it follows that the limit of (b, c) is ¢ “= 7 so that 


b+e t 


E(non.) = N(N — 1) 1— e+ 4(b,0)N - c. 
vs 


bie! 


and hence 


— al 
(8) Variance (m) = N (F. ‘JC om 
util C(t -\ (te 
(9) Covariance (m, m-) = —A ( -¢€ \(5 e ). 
)! c! 


Notice that these are the moments of the numbers of objects of types b, c, -- + , 

in a random sample of N from an infinite population where the fraction of 
i ee pie se ‘ 

b’s ist e /b!, just as it should be. 


16. Fixed or binomial balls and varied boxes. We now consider the case 
where the chance of any ball entering the 7th box is p;. We shall again not 
restrict ourselves to the case 2p; = 1. 

The expectation of xq is immediately seen to be (1 + pi(w — 1)) = 
(1 — p; + p.x), so that the generating function is 

f(z) = 21 — py + pix)” 


and the expectation of nz is 


b 
7 al MV 7, _ pmb b a M\ 577 _ »\™ Pi 
(10) E(n) @) y(1 Di)” Di ¢ ri(1 Di) f os =) , 


Following Stevens [4] with a slight modification, let us set p; = p(1 + A,), 
where p is the average of the p;, so that 2,A; = 0. Then 


: Bh _ _PM 
(1 — p) = (1 — pl +a) = (1 (1 .), 


so that 


m—b 
S(1 — pp; = (1 — p)"*p’ 5, ( - Be (1 + 2)”. 











536 JOHN W. TUKEY 
Expanding the summand, we find 
144-29? +a) 
T=. 


(m — b)(m—b—1)p’ (m— b)bp , b(b— 1)\ .2 , 
+ { ee “1l-. + — -_ say AG + Oi). 





Hence, setting = A (notice this is not the same as Stevens’ A!), we have 


nw =(2)a— 99 


f i ( m—b (p(m aD a b(m — 1) ) ios 
“*scr-i 6vac ae ee 


The expectation for all p; = p has been modified by multiplication by 


(11) L+ aa m—b (p(m — 1) — by b(m — 1) \ 


—b—1 d—p? $m—b-—-I 


plus terms of higher order. For large N and consequently small p the quantity in 


braces is nearly 
m 
b{b — —— 
m—b 


and more roughly is approximately b”. Similarly, the expectations of second 
moments are 


E(nn.) = fe » (1 — ~ — p)™ >“ aint + (6, 0) ye (1 — pp? 


whence 





(.) 2 > (1 — pi — pd” pip 
(12) &(b,c) = — 
(oN) ds (l - py" K Ss (=p) pi 


Making the same sort of expansion yields 


, = (m — ‘o" ( p y’ £ _ | ) 
(13) (b, ¢) = (1 ~ x) aw U-qow l-w Ve 


. 3 
where terms in 2); have been neglected (note that 


D MA; = — 2d = —A-), 


14] 


and where 


_m—b—c N — = m —b 2 
pa ete PSs _—— meta et 


-{p(m — 1) — b}? 











2 





RANDOM GROUP SIZE DISTRIBUTIONS 


+4 m—b—c {0 ~ or m—c a — py 


m—b—c-—-I1WN m—ce—l 


-{p(m — 1) — cf? 


oq - 2  — 0) 


4 1 2be _ b’c 7 cb 
m—-b—c-—-1|\N-—-1 m—-b—-1 m—c—1)° 


This can be reduced to 


y= 2e(2 — £) + 0 »+0(%), 


b 
and for p = 1/N + O(p’) + 0(;). 
¥ = 2pbe + O(p’). 

17. Poisson balls and varied boxes. To reach the Poisson limit, we let m — « 
and p; — 0 so that mp; = t; . The generating function for first moments becomes 
f(x) = 5 ait a 

and the expectation of ns is 


b 
(15) Bin) « Be. 


If we set ¢; = t(1 + A,), this becomes 


b 


E(m) = as e*S,(1 + )’e™ 


The summand expands in the form 


(1+ or + OED 4 MO ~ DO — 9) +) 





_ 
i ~ i eM ww 2} o>. 
x ( 3% 6 ) 


=14 0-01 (OEY wt Sarg 
2 2 
If ¢ is chosen as the average of the ¢; so that ZA; = 0, the sum becomes 


. (b— )*— b\, o-—) 2-2 bY oat 
we (Cag atate (OG Ae F48)at + 








Again setting Aj = A we have 


(16) E(n) i e" (w + (@-2-) 4) 














538 JOHN W. 



















TUKEY 






which can be written 


f { i 
4 rt 8 r 2 
E(n) = N ht e (1 + a\ ((b _ t) a i). 
The generating function for the second moments is 
S@)F(y) = Daye TN 
so that the expectation of n5n, is 
ets nts 4b 
- y 7 oo t a= $2 
(17) E(nn-) = Z + + 4(b, ¢) z -~«™ 
ix) bie! i b! 
which becomes 
be 


E(mn.) = a e* ys ate) + doe 4 + 8(b, c)E(n), 
y!c! to; 


whence we can derive 


(18) 





(b,c) 1 — = — - 


Thus 






(19) Variance (n,) + E(n;) — ae + 7 i. O° (E(m))’, 

w\' * ON 
oe ; . ] A : : 
(20) Covariance (m,n.) > — x ¢ 5V (b — t)(e — ») E(nm)E(n-). 


18. Boxes in a systematic square. Another case which it may be worthwhile 
to write down arises when the boxes are systematically ‘‘rotated” under “spouts” 
of different probability. That is, the number of balls m is a multiple of the 
number of boxes NV, and the probability of the gth ball entering the 7th box 
depends on the value of g — 7 taken modulo NV. An example for N = 3 and 
m = 6 follows: 


Probabilities of entry 












. 


2 3 + 5 6 





| 
1 | Po | P1 P2 Po | P1 | P2 
2 | pe Po pr | pe | Po —i 
| | 
3 | —y | D2 Po p1 | P2 Po 





If m = FN and the subscript r runs through 0, 1, 2, --- 
expectation of f(a) becomes 


N — 1, then the 


“ 








’ m 


DDE (xiq) = N{TL1 — p, + p,x)} 





RANDOM GROUP SIZE DISTRIBUTIONS 539 


Thus first moments, and by proceeding similarly higher moments, are available 
for this case also. 


PART III 


THE POISSON CASE 


19. The Poisson case with equal boxes. The Poisson case is obtained in the 
limit asm — « and p— 0 with pm = t. We wish to show that, in the limit, the 
number of balls in the different boxes are independent. Let ki, , ks, --- , kw be 


the number of balls in the first, second, --- , Nth box, respectively. Then the 
distribution of the k’s is given by, where we write k = ky + ko + +--+ + ky, 


k k m—k k, —m 
m” ‘ — Np)" = m (1 — Np)” (mp) “ie” 
Kylhyt +> ky! PP as 
Now the first two fractions clearly approach unity in the limit, and the inde- 
pendence is proved. 


m* eo N=p 


Since the number of balls in each box has an independent Poisson distribution, 
the distribution of the numbers of boxes each with exactly b balls is that of a 
random sample of N from an infinite population—namely it is a multivariate 
distribution with probabilities 

(mp)’ e~”” 
b! 


REFERENCES 

{1] H. H. Newcomss, ‘Delayed phenotypic expression of spontaneous mutations in Escher- 
ichia coli,’ Genetics, Vol. 33 (1948), p. 447-476. 

[2] I. Oparowskx!, ‘‘Chain processes and their biophysical applications: Part I. General 
Theory,” Bulletin of Mathematical Biophysics, Vol. 7 (1945), p. 161-180. 

[3] V. Romanovsky, “‘Su due problemi di distribuzione casuale,’’ Giornale dell’ Istituto 
Italiano degli Attuari, Vol. 5 (1934), p. 196-218. 

[4] W. L. Stevens, “Significance of grouping,’’ Annals of Eugenics, Vol. 8 (1937-1938), 
p. 57-59. 





THE POWER OF THE CLASSICAL TESTS ASSOCIATED WITH THE 
NORMAL DISTRIBUTION 


By J. WoL¥FowITz 
Columbia University 


Summary. The present paper is concerned with the power function of the 
classical tests associated with the normal distribution. Proofs of Hsu, Simaika, 
and Wald are simplified in a general manner applicable to other tests involving 
the normal distribution. The set theoretic structure of several tests is charac- 
terized. A simple proof of the stringency of the classical test of a linear hypothesis 
is given. 


1. Introduction. The present paper is concerned with the optimum properties, 
from the power function viewpoint, of the classical tests associated with the 
normal distribution. In 1941 Hsu [2] proved the result stated in Section 2 below, 
which is concerned with the general linear hypothesis (in this connection his 
paper [1] of 1938 will be of interest). Also in 1941 Simaika [3] proved similar 
results for the tests based on the multiple correlation coefficient and Hotelling’s 
generalization of Student’s ¢. In 1942, Wald [4] gave a generalization of Hsu’s 
result. 

In the present paper we give short and simple proofs of almost all these 
results, and a simple proof of the stringency property of the analysis of variance 
(Section 5). These proofs rest on theorems which characterize the set theoretic 
structure of the tests. Thus, while the proofs of Hsu, Simaika and Wald are 
rather elaborate and each problem is essentially attacked de nov., the methods 
of the present paper are in effect applicable to the classical tests based on the 
normal distribution. For these tests it will not be difficult to demonstrate the 
analogues of Theorems 1 and 3, and of the results of Hsu, Simaika, and Wald. 
In the present paper we first treat the general linear hypothesis, because it is the 
simplest problem, its solution is easiest to describe, and it admits Wald’s integra- 
tion theorem. Multivariate analogues of the latter are rather artificial and not as 
simple. We then discuss the problem of the multiple correlation coefficient, 
because it seems to be more difficult than that of Hotelling’s 7 and indeed, to 
include all the essential multivariate difficulties. Theorems 6 and 7 are the 
analogues of 1 and 3, respectively, while Theorem 9 describes the essential 
property of the power function which is of interest to us. In other multivariate 
problems one will prove the analogues of Theorems 6, 7 and 9. A generally 
inclusive formulation is no doubt possible. Theorems 5 and 9 are slightly more 
general than the theorems of Hsu and Simaika. 

Many of the statements below may be not valid on exceptional sets of measure 
zero. Usually this is so stated, but sometimes, for reasons of brevity or to avoid 
repetition, this qualification may be omitted. The reader will have no difficulty 
supplying it wherever necessary. 


540 





POWER OF THE CLASSICAL TESTS 541 


‘The author is indebted to Erich L. Lehmann of the University of California, 
who carefully read a first version of this paper. Theorem 4 below was arrived at 
independently by Professor Lehmann, with a somewhat different proof. 


2. The general linear hypothesis. In canonical form the general linear hypothe- 
sis may be stated as follows: The chance variables 


xX, ’ X2 oe Xk41 


have at 2%, °-+ , %4., the density function 


k k+l 
(21)  (A/2e o)“*") exp | - aid (a; — mn) + d 2] = f(n, c) 


with o, m, --- , ™ all unknown. 
Let 7 be the vector (m , --- , 7). The null hypothesis Ho states that 


m=: -=m=0 


and is to be tested with constant size a < 1 (identically in o). 

Let D be any admissible critical region for testing Hy. If A is any event let 
P{A |, ¢} denote the probability of A when 7 and o are the parameters of 
(2.1). We have then 


P{D|0,0} = 


identically in o¢, where 0 is the vector with / components all of which are zero, 
We now prove a property which characterizes all D, This theorem is due to 
Neyman and Pearson [12], and is given here only for completeness. 

THEeoreEM |. The fraction of the surface area of the sphere 


k+l 


~Sa=e 


1 
which lies in D is a for almost all c. 
Proor. Let a be any positive integer, h a positive parameter, and y(y) a 
measurable function of y defined for y > 0 and such that 0 < ¥(y) < 1. In view 
of the distribution of =NXj , it will be enough to prove that, if 


het 
aoe v(y)y’e"” dy = a 
identically for all positive A, that then 

¥(y) = a for almost all y. 
Write 


—hy ot 
at CED if Wy)ye dy =h 


Differentiating both members k times with respect to h and then setting h = 








542 J. WOLFOWITZ 


we obtain the following result. The function 


oe 9 


is a density function with kth moment 
pe = (a+ 1)(a + 2) --- (a4 /). 


The moments y; are the moments of the density function 


=~ 
ra + 1) nv 
They satisfy the Carleman criterion [5, p. 19, Th 1.10], and hence no essentially 
different distribution can have these moments. This proves the desired result. 
THEOREM 2 (Wald). Among all tests of the general linear hypothesis the analysis 
of variance test has the property that, for all positive d, the integral of its power on the 
surface n° = d° is a maximum. 
ProoF. Let ¢ be any positive number. We have only to show that if we allocate 
to the critical region D of the test the fraction @ of the surface area of the sphere 


k+l 
9 9° 
(2.3) ~aae 
1 
for which 

k 

Do 2 
C aa 1 

= 
Zz: 

k+1 


is as large as possible and that if we do this for all c, the desired maximum of the 
integral of the power will be achieved. If C is as large as possible so is 





g k 

wt ODL 

1 i 

et = 2 

2 Cc 

2, % 

1 
Let a, +--+ , @+: be any poiht on the sphere (2.3). Let db be the differential of 
area on the surface 7° = d°. Then 

sai d 
(2.4) I. .- | 0. a) db = (\/2r a) sali exp {- en} 
n2=d2 
. f | exp < (n) ‘| db, 
2 =q2 \ o* 

where z is the vector (a,, --- , a) and (»)’z is the scalar product of the two 


vectors. This last integral is easily seen to depend only upon |z| and to be 
monotonically increasing in | z | . This proves the theorem. 


ally 


ysis 
v the 


cate 
here 


f the 


al of 


db, 


. two 
0 be 


POWER OF THE CLASSICAL TESTS 543 


COROLLARY (Hsu). Among all tests of the general linear hypothesis whose power 
is a function of 4 only, the analysis of variance is the most powerful. 


3. The set theoretic structure of tests whose power is a function only of 7 /o?. 
Wald’s result (Theorem 2) cannot always be extended, in its simple form, to 
tests involving the multivariate normal distribution, but this can be done with 
Hsu’s theorem (corollary to Theorem 2). In order to see what is involved we 
shall investigate the set theoretic structure of tests of the general linear hypothe- 
sis whose power is a function only of 7°/o’. 





Let g(a, «++ , Xx) be the set of points in the region D whose first k coordinates 
are %1, °°: , 2%. Let A(m, ---, 2, 0) be the integral of 
l 
(290%) exp| - . {yx ath 
20” \j=1 J 


with respect tO %i41, *** , Tez, taken over g(x1, --- 2%). We first prove the 
following: 
Lemma. Suppose the power of D is a function only of 4°/c°. Then for two points 


2 g **% 5 XE 
and 
Ss ——- Xi: 


such that 


k k 
(3.1) Da = D2? 
1 1 
we have 
(3.2) A(ti,-°°+, 2,0) = A(t, -++ , te, 0) 


identically in o, with the exception of a set of measure zero. 
Proor. Suppose the statement is false. Then under some orthogonal trans- 


formation T of 21, --- , 2, the region D would go over into a region D* with the 

following property: Let A*(x , --- , 2,0) have the same definition for the region 
° lr oi 1 

D* as A(a,, +--+ , 2%, 0) has for D. Then on a set of positive measure we would 

have 

(3.3) A(ai,°°*,%%, 0) ® A*(M1, -** , Xe, 0). 


We shall now show that (3.3) results in a contradiction. We have 
(3.4) P{D \ 4, 0} = P{D* | Tn, o} 
identically in 7. By the property of the region D, therefore, we have 


P{D\n, 0} = P{D| Tn, o} 


1 The situation here is similar to that described in footnote 3. 





544 J. WOLFOWITZ 


and hence 
(3.5) P{D| 7,0} = P{D*| 7, ¢} 


identically in ». Thus we obtain 


( k 
(3.6) | wr A (x Ree a) exp |- a {2 (x; = ni) 4] dx; acs dx, 
1 


/ 


— : te 2 
” | (270°) 2) 4 *(g, go Bek a) exp | - <2 s (2; = ni? | dx ae dx;, 
2c? 1 ) 
with the integrations taking place over the entire space. Differentiating both 
members with respect to the components of 7 and setting 7 = 0, we obtain that 
the two density functions (for fixed c) 


£2 


(290°) a A(ar, +++ 5 Xe, 0) exp |- 3 2 | 


2s \ 


k \ 
2\—(k/2)_ — 1 | 2 
21a’) " a ‘A*(a1, es » Tk, a) exp | - 2c? 1 at} 

/ 


have identical moments. We shall now argue that these moments satisfy the 
conditions of Cramér and Wold [7, Th. 2], so that the two density functions are 
essentially the same, in contradiction to (3.3). The Cramér-Wold theorem states 
the following: Let Y,, ---, Y,% be k chance variables with a joint distribution 
function, and write 


Then the divergence of the series 


is o—~" 


n=1 


is sufficient to ensure that there exists essentially only one distribution which has 
these moments. We notice that the factor 1/a@ of course makes no difference. 
If we set A(am,, --- , a, 0) and A*(x,, --+ , x, ¢) both identically unity and 
consider the resulting moments which enter into the A», , we see that these 
moments satisfy the Cramér-Wold condition. Now A and A* are <1. Thus, 
using the true value of A can serve only to increase the value of \z,/”” 
the series will diverge a fortiori. This proves the lemma. 

The following theorem helps to describe the set theoretic structure of tests 
whose power is a function only of \ = 7/0": 

THEOREM 3. Let D be a test whose power is a function only of r. Let u be any 
positive number, and D(a,, --- , x, u) be the fraction of the “area” of the sphere 
Din Tia; = U’ occupied by points which are in D and whose first k coordinates 
Wet, >, %. if 


, so that 





POWER OF THE CLASSICAL TESTS 


k k 


87) Dai = Dey 


1 
then, except on a set of measure zero, 
, / 
(3.8) D(x, g SA SR u) = D(x; g ee? 5 Tes u). 


Proor. We shall show that, if the power of D is a function only of \, the 
failure of (3.7) to imply (3.8) would contradict the preceding lemma. Suppose 
then that (3.8) is not true on a set of positive measure. Under some orthogonal 
transformation on 21, --- , x» we obtain’ a function D*(x7,, --- , x, v) which 
differs from D(a, --- , X, uw) on a set of positive measure and such that, for 
almost every %1,°°° , 2%, 


wo 
. —l l—1 (—u2)/2e2 
A(x, ees , Tk, a) = Kk [ D(x,, ee ude u e! u*)/2¢ du 
0 


eo 
. —l I-1 (—u?)/2e2 
on K [ D*(z,,-°-,%, ug ue” ™ du 
0 


identically in o, where K is a suitable constant of no interest to us. Multiplying 
by o', differentiating repeatedly under the integral sign with respect to oc, and 
setting ¢ = 1, we obtain the result that the two density functions in w, 


KD(a,, +++, 2%, U) 1-1 ud) /2 
ue 


KD* (2; Pre Thy u) l-1 eee 


—— 
A(x, we 5 aR 1) 


are identical except perhaps on a set of measure zero. This contradiction proves 
the theorem. 

THEOREM 4. A necessary and sufficient condition that the power of D be a function 
of X only, is that, with the usual exception of a set of measure zero, D(x, , +++ , t, U) 
be a function only of 


k 
e 


ue 


The proof of this theorem is not essentially different from that of the preceding 
theorem, and we shall therefore sketch it only briefly. Let Z be a transformation 
on (4%, °°: , 2%, u) = (x, u) which consists of a rotation of the vector x, followed 
by a multiplication of w and the components of x by a positive constant c. If 
D(x, u) is not a function of Yi a? /w* alone, then, just as before’, we can use some 


2 See footnote 1. 
3 This statement implies that a function of 7: ,-+- , 2, , «, Which is invariant to within 
sets of measure zero under all transformations Z (the exceptional set may depend on the 








546 J. WOLFOWITZ 


transformation Z to give us a function D*(x, w) such that 
D(x, u) *& D* (x, u) 
on a set of positive measure, while 
ED(x, u) = ED*(x, u) 


identically in », ¢. This yields a contradiction in the usual manner and proves 
the necessity of the condition. 

To prove sufficiency, write D(v, vu) = v(Zaji/u) = v(v). Let y(v, 7, ¢) be the 
density function of v. Then 


PiD | n, oc} = i v(v)y(v, n, o) dv. 
0 


By hypothesis, v(v) is a function only of v. We know [9, p. 140, eq. 101] that 
y(v, n, ¢) is a function only of v and X. Hence P{D , 7, c} is a function only of X. 
This completes the proof of the theorem. 

THEOREM 5. Among all tests of the general linear hypothesis which have the 
properties described in the conclusions of Theorems 1 and 3, the classical analysis 
of variance test 1s the most powerful. 

We shall omit the proof of this theorem, which is very similar to that of the 
more difficult Theorem 9 below. 

Theorem + above shows that there exist regions D which satisfy the conclusions 
of Theorems 1 and 3 and such that P{D | n, ¢} is not a function of \ alone. It 
follows that the content of Theorem 5 is greater than that of Hsu’s theorem 
(Corollary to Theorem 2). 

It is instructive to note that Hsu’s theorem follows almost immediately from 
Theorem 4 and the form of y(v, A). For let \ be fixed but arbitrary. One verifies 
immediately from the form of y(v, A) that 


x(v, d) 

x(v, 0) 
is, for fixed \, a monotonically increasing function of v. This, by Neyman’s 
lemma, immediately proves Hsu’s result. 


4. The multiple correlation. coefficient. We shall now apply our methods to a 
multivariate test. For typographic ease we shall conduct the discussion for the 


Ss, 2 

transformation), is a function of a except on a set of measure zero. This statement 
would be completely trivial were it not for the exceptional sets; in any case it must be well 
known to set theorists. The author constructed an unnecessarily long proof of it, and 
believes that a more expeditious proof can be constructed using the ideas of [11, page 91, 
Theorem 11.1, and page 318, p. 7]. Professor C. M. Stein of the University of California 
has informed the author that this result is a special case of one established by himself and 
G. H. Hunt in a forthcoming paper. For these reasons the proof is omitted. (See also [13, 
page 27, Lemma 9.1].) 


hat 
F A. 


the 
yS18 


the 


ions 
. It 


rem 


rom 
ifies 


an’s 


to a 
the 


nent 


well 
and 
re 91, 
yrnia 
T and 
» (13, 


POWER OF THE CLASSICAL TESTS 547 


case of three variates, but the reader will observe that the procedure is really 
perfectly general. 


The chance variables {¥;;}, 7 = 1, 2, 3,7 = 1, ---, n, have the density 
function , 
{ n 3 \ 
(4.1) g(B) = (2n) ee (| B |)" : exp \7? » r™ bi Yis Yui? 
gs te™ z 


where 1) B = {bj} is a positive definite (symmetric) 3 X 3 matrix, 2) y;; is the 
value assumed by Y,;. The null hypothesis Ho asserts that a given multiple 
correlation coefficient is zero, say that of Y; with Ys and Y3, ice., 


(4.2) be = ba = by = by = 0. 


The test is to be made on the level of significance a, i.e., if By is any matrix 
which satisfies (4.2), and if G is a critical region for testing Hp , then 


(4.3) PiG | Bo} =a 


where the symbol in the left member means the probability of G according 
to g(Bo). 


Write 
n 
nsij = as Yik Yik 
k=l 
) 
: ( S20 $23 | 
S — } }. 
\ S39 S33} 


Let M(cu , C) be the manifold in the 3n-space of 


ee ee 
where sy = Cn, S = C. First we prove the following: . 

THEOREM 6. Any region G which satisfies (4.3) must have the property that the 
fraction of the area of M(cu , C) which lies in G is a, for any positive cy, and any 
positive definite 2 X 2 matrix C = {c,;}. (We remind the reader that exceptional 
sets of measure zero are not precluded). 

Proor. Let (en, C) be the fraction of the area of W(eyn, C) in G. Recall 
equation (4.3) and the fact that su, so, S23, S33 are sufficient statistics for the 
elements of By). On the manifold 1 (cy, C) the conditional density is uniform. 
Employing Wishart’s distribution [6] we conclude that 


' 


(4.4) K' | ven.) | Bo | N|S|° 9,0?" 


n , 
- €xp |-3 {bur Si. + be 82 + 2bo3 S23 + bea ds, ds822 d823 ds 33 = a 


where K’ is a suitable constant which need not concern us. Here the symbol 








5348 J. WOLFOWITZ 
™ 22, be3 , b33 , provided only that by, > 0, be > 0 
boobss — bos > 0. Of course sy is distributed independently of se, S23, S33. 
Proceeding as in section 2, we can, by differentiation with respect to the b’s, 
obtain all the moments of the s;;’s. Now let the b’s take any admissible constant 
values. The moments of the s,;’s are then seen to satisfy the criterion of Cramér 
and Wold [7, Th. 2], and consequently essentially uniquely determine the 
distribution of the s;;. The desired conclusion follows as before. 

The six parameters which uniquely determine the trivariate normal distribu- 
tion (of Y;, Ye, Y3) with zero means may be taken to be the following: 

1) The covariance matrix {o;;}, 7,7 = 2,3, of Yo and Y3. 

2) The partial regression coefficients 8, 3;, of Y; on Y2 and Y3. These are 
defined as follows: Let E(Y1 | Ye = yo, Y3 = y3) denote the conditional expected 


value of Y;, given Yo = yo, Y3 = y3. Then 
E(y, | Y2 = Y2, Y; = Ys) = BoY2 + B3Ys « 


3) The conditional variance w of Y;, given Y2 = ys, Y3 = ys. 
The population multiple correlation coefficient R of Y; with Y2 and Y3 is then 
defined by 


=’ means identically in by , bee 


Rw” 2 2 
——s = B2o2 + 28283023 + 83 033. 
(1 — R*) 
The six parameters above may be chosen arbitrarily, provided only that {o;;} 
is positive definite. R and w are, by definition, non-negative. 


/ ° 
Let y; be the column vector ya, --- , Yin ; let y; be its transpose, and let y 
denote the point yn, Ye, °°, Yiny Ys *** > Yan In 3n-space. Let z(y) = 
2(y1 , Yo, ys) be the component of y; in the plane of y2 and y; ; let r = | z(y) | and 6 


the angle between z and ye, measured positively say in the direction of y;. 
Finally let h be the absolute value of the vector y: — z2(y1, Ye, Ys). 

We intend now to investigate the set theoretic structure of tests whose power 
is a function only of R, and for this purpose prove the following: 

Tueorem 7. Let H be a region whose power is a function only of R. Let 
V(h, r, 0, 822 , S23 , 833) be the fraction of the ‘‘volume” of the manifold on which 
h, r, 0, soo , So3 , S33 are fixed which is contained in H. With the usual exception of a 
set of measure zero, for fixed h, r, 822 , S23 , 833 , the quantity V above is constant for 
all 6. ; 

Later, after this theorem is proved, we shall write V without exhibiting 6. 
This procedure is justified by Theorem 7. 

Proor. Suppose the theorem false, and proceed as in Theorem 3. A suitable’ 
rotation of the radius vector z(y) implies an orthogonal transformation 7 on the 
generic point y which leaves h, 7, S22 , $23 , and $33 unaltered, and takes the region H 
into a region H* such that H and H* differ on a set of positive measure. JT leaves 
R invariant, hence leaves invariant R which uniquely determines the distribution 


4 See footnote 1. 

































Y's, 
nt 
1ér 
he 


u- 


Are 


ted 


nen 


oi 5} 
ty 


id 6 
¥3- 


wer 
Let 
hich 


of a 
t for 


ig 6. 


rble* 
1 the 
on H 
aves 
ition 





PCWER OF THE CLASSICAL TESTS 549 


of R. Hence an argument almost the same as that which led us to (3.5) yields the 
conclusion that the power of H and the power of H* are equal, identically in B. 
Proceeding as in Theorem 3, we obtain two essentially different density functions 
in h, r, 0, S22 , 823 , 833 , Whose integrals over the entire space are identical in the 
elements of B. From these functions we obtain two different density functions in 
s(t, 7 = 1, 2, 3), with identical moments (obtained by differentiation with 
respect to the elements of B). The rest of the proof is essentially no different 
from that of Theorem 3. 

TueEorEM 8. In order that the power of H be a function of R alone, it is necessary 
and sufficient that, with the usual exception of a set of measure zero, V(h, 1, Sex , 823, $33) 
be a function only of h/r (i.e., of R). 

The proof of this theorem is essentially the same as the proof of Theorem 4. 
The place of the transformation Z is taken by one which consists of any linear 
transformation on the vectors ye and y;, the addition of a constant angle to @ 
(rotation of z(y)), and multiplication of the vector y, by a positive scalar c. 
This transformation leaves & invariant. In the proof of sufficiency we use the 
distribution of R (see, for example, [10, p. 384, equation (15.55)]). The remainder 
of the proof is essentially the same as that of Theorem 4. 

THEOREM 9. Among all tests H which have the properties described in the conclu- 
sions of Theorems 6 and 7, the classical test based on R is the most powerful. 

As a corollary to this theorem we have the following result due to Simaika 
[3]: Of all tests H whose power is a function of R only, the classical test based 
on RF is the most powerful. 

Simaika’s result also follows easily from Theorem 8 and the density function 
of R in the same manner that Hsu’s result followed from Theorem 4 and the 
density function of v. 

In the course of the proof of Theorem 9, the various symbols W, with or 
without subscripts, will denote suitable functions of the variables exhibited, 
and the various symbols /:, with or without subscripts, will denote suitable 
constants. 

We have that 

( n \ 


P{H|B} = [enor ar exp —# Dv Bus} dom aed 


d 





= [ ensyonr exp |-a {yn a (Bo Y2 + avd | . 


7 2\ (—n)/2 ( 1 , 
na 0 (S22 y S235 S33, {oi;}) dy es dY3n = (27w’)* / exp 4-3 (B2Y2 + Bs Ys) or 
(4.5) A lo 


exp Ee {yi + Bose + 28283823 + atoal | ° 


. Wo(sx, S23, $335 {o;}) dyu or dYsn. 


Now (Gy + Bzy3)’2 is a function only of B., Bs, se, 83, $3, 7, and 6. Also 


~<? 








550 J. WOLFOWITZ 
2 2 2 
h + r=8& =n. Thus 


P\H | B} = / V(A, r, 822, S23, 833) Wi(h, 7, S22, S23, 833, {B}) 


( 


‘ expy = (8242 + Bsys)'z} dé dh dr ds22 ds23 ds33 = / V(h, 7 822, S23, 833) 


® 


i 
. ad a , 
(46) - Wi(h, 7, 822, 823, 833,{B})(4hr)~* exp \e (B2y2 + Bsys) : 
- d0 dh* dr’ ds dsg ds33 = | V (Vy? — 3°, 1, 822, S23, 833) 


. ; : fq : 
» Wal Vy? — 7, 1, 90, 83, 833, (BY) exp) 3 (B2 y2 + Bs ys) i 
| } 


- dé dr° dy’; dso dso3 d833. 


Integrating with respect to @ and designating 
(q 5 
W, [ exp \— (8242 + Bsys)'2 > dé 
Ve } 


by WV y — 1, 1, Sx, 83, 833, {B}) we observe that just as in (2.4), W is 
monotonically increasing in r (all other variables fixed). Thus we have 


(4.7) P{H| B} = | VW dr? dy? dome d8.q A583 . 


In constructing H only the function V is at our disposal, and this subject to the 
limitations imposed by the conclusions of Theorems 6 and 7 and the fact that 
+r = yi = sy. The function W is not within our control at all. With y;, 
Soe , 23 , S33 fixed, W is monotonically increasing with r. To maximize the power 
it is therefore best to distribute the ‘‘mass”’ so that V is as large as possible for 
large values of r and hence of R. This implies the classical test and proves the 
theorem. 


5. Stringency of the classical tests. Wald [8] calls a test Ti “most stringent” 
if the following is true: Let {7} be the totality of tests. Let @ be the generic 
point in the parameter space, and P{T | 6} be the power of 7 at the point @. 
Let JT. be any test other than 7, . Then 

sup [sup P{T |@} — P{T,|6}] < sup [sup P{T |6} — P{T.| 6}). 
@ {T} 6 {7} 
Of course, we have omitted to specify the totality {7}. One can admit all tests 
whose size < a, a given constant between 0 and 1, or restrict one’s self to tests 
whose size is exactly a. We shall do the latter. 

Under these circumstances we shall prove that the classical test of a linear 

hypothesis is most stringent. Our proof will occupy but a few lines, and is an easy 





he 


i 
ric 





POWER OF THE CLASSICAL TESTS 551 


consequence of the structure of the classical tests as described in the lemma of 
section 2. The result itself is a special case of an unpublished theorem due to 
G. H. Hunt and C. M. Stein, and all priority on this result is theirs. 


Return then to the notation of section 2. Let o be fixed at any arbitrary 


positive value, and the surface 


no 
~ 


3 
ll 

SQ 
oS 


. . 
be that one on which 


a(n) = sup P{T | 9} — P{La|} 


is a maximum, where JL, is the classical test of the linear hypothesis. It is clear 
that this maximum is actually achieved, and that w;(7) is a constant on the 


° 9 . 
surface 7 = co. Let Le be any other test (of size a), and we(n) be the corre- 
sponding function for L,. We have only to show that on the surface 7° = ¢9 


we cannot have everywhere w2(7) < w:(7), and our proof is complete. If everywhere 
on the surface 7? = ch we had w2(7) < w:(n), we would have, also on the same 
surface, P{L.| nn} > P{L,| n}. This would, however, violate Wald’s Theorem 2 
(section 2) and proves the desired result. 


REFERENCES 


[1] P. L. Hsu, ‘“‘Notes on Hotelling’s generalized T,’’ Annals of Math. Stat., Vol. 9 (1938) 
p. 231. 

[2] P. L. Hsu, ‘‘Analysis of variance from the power function standpoint,”’ Biometrika, 
Vol. 32 (1941), p. 62. 

[3] J. B. Srmarxa, “On an optimum property of two important statistical tests,’’ Bio- 
metrika, Vol. 32 (1941), p. 70. 

[4] A. Wap, “On the power function of the analysis of variance test,’”’? Annals of Math. 
Stat., Vol. 13 (1942), p. 434. 

[5] J. A. SHonat anp J. D. Tamarkin, The Problem of Moments, The American Mathe- 
matical Society, New York, 1943. 

[6] Joun WisHart, “The generalized product moment distribution, etc.,’’ Biometrika, 
Vol. 20A (1928), p. 32. 

[7] H. Cramér anv H. Wo tp, ‘‘Some theorems on distribution functions,’’ Lond. Math. 
Soc. Jour., Vol. 11 (1936). 

{8] A. Waxp, ‘“‘Tests of statistical hypotheses concerning several parameters when the 
number of observations is large,’? Am. Math. Soc. Trans., Vol. 54 (1943), p. 426. 

[9] P. C. Tana, ‘“‘The power function of the analysis of variance etc.,’’ Stat. Res. Memoirs, 
Vol. 2 (1938) (University of London), p. 126. 

[10] M. G. Kenna, The Advanced Theory of Statistics, Vol. 1, Charles Griffin and Com- 
pany, London, 1945. 

{11] S. Saxs, Theory of the Integral, (Second Edition), G. E. Stechert and Company, New 
York, 1937. 

[12] J. NEyMAN AND E. S. Pearson, “On the problem of the most efficient tests of statistical 
hypotheses,”’ Roy. Soc. London Phil. Trans., Ser. A, Vol. 231 (1933), pp. 289-337. 

[13] EBeruarp Horr, Ergodentheorie, Chelsea, New York, 1948. 





APPLICATION OF THE METHOD OF MIXTURES TO QUADRATIC 
FORMS IN NORMAL VARIATES 


By HERBERT RoBBINS AND E. J. G. PITMAN 
Institute of Statistics, University of North Carolina 


1. Summary. The method of mixtures, explained in Section 2, is applied to 
derive the distribution functions of a positive quadratic form in normal variates 
and of the ratio of two independent forms of this type. 


2. The method of mixtures. If 


(1) F(x), F(z), 

is any sequence of distribution functions, and if 

(2) O,C,°°° 

is any sequence of constants such that 

(3) c; > 0 (j = 0,1,---), Zc; = 1 


(all summaticns will be from 0 to » unless otherwise noted), then the function 
(4) F(z) = 3c; F;(2) 
is called a mixture of the sequence (1). 

It is sometimes helpful to interpret F(x) in the following manner. Let J, Xo, 
X,,-°-°-+ be variates such that J has the distribution P[J = j] = c; (j = 0,1, ---) 
and such that XY; has the distribution function F(x). Let X be a variate such 
that the conditional distribution function of X given J = j is F;(x). Then the 
distribution function of X is 


PIX <2] = =PlJ = j]-PIX <2|J = j] = 2c; Fix) = F(z). 


This interpretation of /'(x) will, however, not be involved in the present paper. 

The following statements are proved in [1]. If x = (m, --- , ,) is a vector 
variable the function F(x) defined by (4) is a distribution function, and for 
any Borel set S, : 


(5) | aF@) = Ze, / dF,(z). 
More generally, if g(a) is any Borel measurable function then 
(6) [ 9@) ar(@) = Se; [ g(x) aF yo) 


whenever the left hand side of (6) exists. In particular, the characteristic function 


552 


METHOD OF MIXTURES 553 


y(t) corresponding to F(x) is 
(7) g(t) = 2c; ¢;(t), 


where ¢;(t) is the characteristic function corresponding to F ;(x). 
If each F’;(a) has a derivative f;(x) then F(x) has a derivative f(x) given by 


(8) f(x) = Xe; f;(z), 


provided that this series converges uniformly in some interval including z. 
Conversely, if (8) is the relation between the frequency functions and if the 
series is uniformly convergent in every finite interval, then the relation between 
the distribution functions is given by (4). In practice we deduce (4) from (8), or, 
using the uniqueness theorem for characteristic functions, from (7). 

As regards computation, we observe that for any integers 0 < p; < pe and 
for any x it follows from (3) and (4) that 


P2 pi-l 00 
0 < F(x) — do ¢F,(x) = » cj; F(a) + d c; F(x) 
Pi P2 
(9) p—i pi-l p2 Pp2 
< sup ra} “) + sup {F;(x)} (i oe lig a Xa) <1- 7 Cj. 
i< D1 0 i> p2 0 Pi P1 


The existence of these upper bounds (the last a uniform one) for the error term 
when the series (4) is replaced by a finite sum shows that series expansions of the 
mixture type (4) are especially well adapted to computational work. 

For some purposes it is useful to consider series expansions of the type (4) 
where the c; may be of both signs and where the series 2c; may diverge. Both 
parts of (3) will, however, be satisfied in the cases considered here. 

If U, V are independent variates with respective distribution functions 
F(x), G(x) we shall denote the distribution function of any Borel measurable 
function H(U, V) by 

H(U, V) (F(x), G(a)). 
Now if F(x), G(x) are both mixtures, 
F(x) = 2Cj F(x), G(x) = rd; G:(x), 
then by (5), 


PIH(U, V) <2] = | | dF(u) dG(v) 


{H( u,v) <2} 


= DYe; de I] dF;(u) dG,(v), 


{H( u,v) <2} 


so that 


(10) H(U, V)(2e;F (x), 2d. G.(x)) = TTe; d, H(u, v)(F;(x), Gi(x)). 








554 HERBERT ROBBINS AND E. J. G. PITMAN 


As an application of the principles set forth in this section we shall express 
as series of the mixture type (4) the distribution functions of any positive 
quadratic form in normal variates and of the ratio of any two independent forms 
of this type. Special cases of the problem have been dealt with by Tang [2], 
Hsu [3], and many others, but the method of mixtures permits a unified and 
simple treatment of the general case. 


3. Distribution of a positive quadratic form. We shall denote by F, (x) the 
chi-square distribution function with n > 0 degrees of freedom, 


1 zr 
a a tn-1 tu, 
(11) F, (x) o” Gn) | une edu (x > 0), 
= 0 (x < 0) 
The corresponding characteristic function is 
(12) g(t) = [ e dF,(x) = (1 — 2%)” = w' , 
0 
where we have set w = (1 — 2it)”’. We shall denote by x’, any variate with 


the distribution function (11). 
Let a be any constant such that a > 0. The characteristic function of the 
variate a-x;, is 
—in 
(13) (1 —2iat)-™ = [a(1 — 2%) — (a — 1)” = oP w-(1 “ (1 ~ a) 


By the binomial theorem we have for any a > 0, 


annie 1 —in ° | 1 —] 
(14) oi-(i- = Se; j2z|<j/1-—- ), 
a | a 


where 


i an(2 + wool Sg: oe 1\? 
an ¢ = «*. 2n(an 1) 2 n+ ) (1 = ‘) oT ee 


a 





For a > 1 we see from (15) that all the c; are non-negative. Likewise for a > } 
(and hence 4 fortiori for a > 1) we have |1 — 1/a|"* > 1 so that (14) holds 
for all | z| < 1; setting z = 1 it follows that the sum of all the c; is equal to 1. 
Hence for a > 1, 


¢>0 (G=0,1,-:-), Bey=1. 


Since |w| = |1 — 2it|~* < 1 for all real ¢ it follows from (13) and (14) that 
fora > l, 


(1 — 2iat)” = Se; w'"*? = ei — 2it)"? 


(16) - =C; ¢n42j(l). 





th 


he 


in 


1 
Pe 


ds 
> i. 


hat 





METHOD OF MIXTURES 999 


Hence for a > 1 the distribution function F,(z/a) of the variate a- x’, is a mixture 
of x’ distribution functions, 


(17) F,,(x/a) = 2c; Fn423(2), 


where the c; , determined by the identity (14), are the probabilities of a negative 
binomial distribution. 

It may, in fact be proved by a direct analysis, which we omit here, that (17) 
holds for any a > 0. However, if a < 1 then the c; will be of alternating sign, 
and if a < 3 then the series Zc; will diverge. This shows incidentally that a 
relation of the form (4) can hold even though the series Yc; diverges and hence 
the corresponding relation (7) does not hold for ¢ = 0. 


THEOREM 1. Let 
X = a(xin + a Xm + +++ +4, tna 


where the chi-square variates are independent and a, a, -++ , @, are positive constants 
such that 


a;2=1 (@=1,---,7r). 


Define constants c; by the identity’ 


r —}m; 
(18) TT {art 1 _ (1 — | \ = Lez’ (jz| <1); 


A; 
then obviously 

c; 20 (j = 0,1, ---), rec; = 1. 
Let 

M=m+m+°:-+m,; 
then for every x, 
(19) P(X < a] = Ye; F u42;(x/a). 
For any integers 0 < pi: < po and every x, 
0 < PIX <a] — LejFusa(e/a) 
1 


pi-l 1 


p2 
= Fulx/a)-( > “) + Fusapaste/a)-(1 = oy = “) 
0 Pi 
p2 
< 1 — Z Cj. 
Pi 


Proor. The characteristic function of X/a is, by (13) and (18), 


et —4}m; ) 
)  —tm; ] ] 
g(t) = wi. II<a;* {3 _— ¢ _ tw \ = Le; w = Ic; om42;(t) 


i=1 | 


PR 


_-~ 
ho 
So 
—_ 

A 


i- 
0 





1Tf r = 0 we regard the left hand side of (18) as having the value 1. 





556 HERBERT ROBBINS AND E. J. G. PITMAN 


Hence for any y, 
P[X/a < y] = Ue; Fue;(y), 


whence (19) follows on setting x = ay. Finally, since F(z) is a decreasing function 
of n for fixed x, (20) follows from (9). 

It should be observed that the coefficients c; determined by (18) can be 
written explicitly as the multiple Cauchy products 


Cj == >> {C1,5,°° * Crip} 
iyte: tips7 


where 


pel Seti 9 f, _ 2Y 


Cij = a; j! 


(¢ = 1,---,7r;7 = 0,1, ---). 


c = (1,3, 
j 
( (s— 
¢;” = dX {cf ’ Ces} (s = 2, rn ta r), 
jan 
cf” = ¢;. 


4. Distribution of a ratio. The ratio x3,/x%, of two independent chi-square 
variates has the distribution function 


r(4(m + n)) [7 


im-1 —}(m+n) 
it nape eerie 2 
(21) Fn.n(2) TGmrGn) + “ (1 + u) du (x > 0), 
= 0 (x < 0), 
In computational work we can use the tables of the Beta distribution function 
tine a TOPS safe Oee <0, 


I(r) -T(s) Jo 
= 0(x < 0), 1 (x > 1), 
together with the identity 
Fmn(%) = Izja+a)(3m, 2”). 
THEOREM 2. Let 


a: (xm + O1xXm, + +++ + Orxm,) 


22 X = 5 5 
( Xn + O&ixa, + = + bx, 


where the x’ variates are independent and a, a1, +++ , Gr, bi, +++ , be are positive 


r), 


ure 


itive 


METHOD OF MIXTURES 557 


constants such that 


Define constants c; , d; by the identities 


- —}m; 

Iai" [i _ (1 — ).| \ = LYe;2’, 

= 7 alta (| 2| < 1) 
IT jo. E -(1 an 1). \ = Ydz; 

t=] t 


c; > 0, zc; = l, d, = 0, zd, = 1. 


then 


Let 
M=m+m+:--:+™m,, N=n+m+-°- +7%,; 
then for every x, 
P[X < 2] = Ure; dk-F w42;,w+2n(x/a), 


and for any integersO < 71 © m,O SH S q and every z, 


P2 a2 
0 < P[X <a] - > 2S di. + Fe42;,w42x(x/a) 


Pi a 
<(1-Xo)-(1- La). 
Pi q1 
Proor. Let U, V denote respectively numerator and denominator of (22). 
From Theorem 1, 
PIU < 2] = Xe; Fuye;(x/a), 
PIV Sa] = Dd Fvix(z). 
Hence by (10), for every z, 
P[X < 2] = P[U/V < 2] = Dc; de-F w42j,~42x(2/a). 
The rest of the theorem is obvious. 
Coro.tuary. Let 
xi 
ax; + bxs 
where the x’ variates are independent and 


O<e < &. 














HERBERT ROBBINS AND E. J. G. PITMAN 


Define 
a = a/b, N=rets, 
1.(1 janie —_— , 
oq = ot. BUS DIN a a) Cae ocd 
then 


and for every x, 
P[X < 2] = De; Fv,n42;(az). 


For any integers 0 < pi < pe and every x, 


0o< p[|x > az] —- > e{1 — Fy,v+42;(ax)] 


Pi 


pi-l 


(23) > [1 = Fy,y(az)] 7 ( Zz “) + [1 ae Frys ,w+42p242(ax)] 
pet p2 P2 
(i —-Vag- Xa) <1-— Dey. 
0 P1 P1 
Proor. Except for (23) this is a special case of Theorem 2. To prove (23) 
we observe that 
P[X > x] = 1 — P[X <a] = Xcel — Fuv42;(ar)], 


and since for fixed m and x, Fm,,(x) is an increasing function of n, (23) follows 
in the same way as (9). 


5. The non-central case. Let Y be normal (0, 1) and let X = (Y + d)?, where d 
is any constant. The frequency function of X is, for x > 0, 


f(x) = (Qrxy tc 8O. (8 4 ey /2, 
By expanding the last factor into a power series it is easily seen that 
(24) f(x) = >pj-fis2,(x), 


where f,(z) = F,(z) is the chi-square frequency function with n degrees of 
freedom and where 


ps = -(@*)*/j! ce hincd, 
Since the identity 
we eee = pss! (all z) 
holds, it follows that 








5 


f 





METHOD OF MIXTURES 559 


The series (24) is uniformly convergent in every finite interval, so that we 
can write the distribution function F(x) and characteristic function ¢(t) of Y 
in the forms 


F(x) = 2pj-Fi42;(2), 


g(t) = Dpj-¢142;(t) ee weg a—w) 
where again we have set w = (1 — 2it)™’. 
Now let Yi, --- , Y, be independent and normal (0, 1) variates and let 
(26) X= (Yitd)y +--+ + (Yat dy’, 


where the d; are constants such that 
dit+---+d,=@. 
The characteristic function of X is then » 
o(t) = wie OM = Spiwl = Ep; ensill), 


and hence the distribution function F(x) of X is again a mixture of x’ distribution 
functions, 


(27) F(x) = Spj-Fni2;(x), 


where the p; , determined by the identity (25), are the probabilities of a Poisson 
distribution with parameter \ = 3d”. We shall denote the non-central chi-square 
variate (26) by xna- 

We can now generalize Theorems 1 and 2 in a straightforward manner to 
cover non-central chi-square variates. We shall state only the generalization 
of the Corollary of Theorem 2 to the case in which the numerator is non-central. 


THEOREM 3. Let 





X = xan 
ax; + bx’ 
where the x” variates are independent and 
0O<ac<b. 
Define 
h=1d, az=a/b Ne=ercts 
py = e-ri/j! (Gj = 0,1, +++), 
» 4s(4s + 1)---(4s8 + k — 1) 
a SS ae KE: 
then 


c > 0, Se =1, 








560 HERBERT ROBBINS AND E. J. G. PITMAN 
and for every x, 


P[X < a] = TIp; cx F w+2;.n+2%(ax). 
For any integersO ga < m@,0OSka She, 


g2 he 92 ho 
O< PIX <2 — DD dick - Fuse;nic(ax) < (: ~ Ss) (1 _ Ya). 
gi hy 91 hy 


REFERENCES 
[1] Herspert Rossins, ‘‘Mixture of distributions,’’ Annals of Math. Statistics, Vol. 19 
(1948), p. 360. 
[2] P. C. Tana, “The power function of the analysis of variance tests with tables and 
illustrations of their use,’’ Stat. Res. Mem., Vol. 2 (1938), p. 126. 


[3] P. L. Hsu, ‘Contributions to the theory of ‘Student’s’ t-test as applied to the problem 
of two samples,”’ ibid., p. 1. 


THE JOINT DISTRIBUTION OF SERIAL CORRELATION 
COEFFICIENTS 


By M. H. QuENOUILLE 
Rothamsted Experimental Station 


1. Summary. An expression for the joint distribution of serial correlation 
coefficients, circularly defined, has been derived. It has been shown that this 
distribution possesses properties similar to those already encountered in the 
distribution of a single serial correlation coefficient, i.e. it is definedby different 
function forms for various subregions. The distribution thus found is of little 
use for computational purposes. Consequently, approximate forms have been 
investigated and the suitability of the ordinary partial correlation coefficient 
for large-sample testing has been inferred. 


2. Introduction. Anderson [1] has derived the distribution of the serial 


correlation coefficient 
n n 2 
> dutlies = (= «) /n 
i=l i=l 


n _ n 2 ? 
7 (= «) /n 
i=1 i=1 


where the e; are normally and independently distributed with mean y» and 
variance o and where a circular definition is employed, so that ¢,4; is defined 
to be equal to e;. However, in making a test of any series, we shall usually be 
faced with a set of serial correlation coefficients, so that we shall require a joint 
distribution function of 7, , re, +--+ , fm say. This distribution function is derived 
below by an extension of the method used by Koopmans [2]. 

It should be noted that Bartlett [8] has shown that for large samples the 
variances and covariances of the r; are independent of the distribution of «; 
under fairly wide conditions. This means that the joint distribution function 
obtained for normal e¢; will often give a good approximation for non-normal e¢; 
and can be used as the basis for any test of the correlogram. 


r= 





3. Conditions on the r;. It is easily seen that the 7; cannot take all values 
from +1 to —1 independently. For example, 7. cannot take a value near —1 
if r, takes a value near +1. As a result, there will be certain necessary conditions 
that the 7; will have to fulfil. It is not difficult to find these conditions, since, if 


yi(t = 1, 2, --- , m) are any set of variables, then 
n n 

(1) 2X (ens ys - (= é)r, YLY i+ » 
= = 


where €; may or may not be corrected for the mean and the double-suffix sum- 
mation convention is employed. 


561 





562 M. H. QUENOUILLE 


Thus, provided 0 < m < n/2, we will have 


T1 T2 


(2) 





Je REIT aie r/o. 


as a necessary condition that the right-hand side of (1) be positive definite and 
this expression will impose necessary conditions upon the joint distribution 
of the r,. 

Fig. 1 gives the limits of possible values of r; and r. subject to (a) no restriction, 
(b) rs = 0, (c) 73 = 74 = O. 


4. Complex Integration in m Variables. Before finding the joint distribution 
function of the r; some introductory remarks on complex integration involving m 
variables will be necessary. 





SERIAL CORRELATION COEFFICIENTS 


We can evaluate an integral such as 
S21, 2 *** em) Z +++ 2m) 
dz -+-d 
fe | a (2; — a;) , ” 


where J(a;) = 0 and f(a, 2, «++, 2m) is regular in the region J(z;) > 0, by 
successive Cauchy integrations, so the integral has a value (277)"f(a; , --- , dm). 
In the same manner as for Cauchy integration, it will be possible to distort the 
contours over which we integrate so that we can evaluate 


f(a-- 
I fae = gy Oo ee 


j7=1 


provided that f(z: , --- , 2m) is regular in the region defined by S, and (a,, - ++ jan) 
is enclosed in this region. 
More generally, if we have an integral of the form 


[- fz f(Zi+**%m) 
dz, +--+ dzZm, 
Il (a5; 2; = b;) 


j=l 


and we make the transformations w; = aj;2; and b; = aj;c;, ie. W = AZ, 
C = AB, it is possible, in the above manner, to evaluate the integral as 


4. (2ni)” 
* TAI 


Suppose we now consider the integral 


/ f(21 +++ 2m) ; 
see ee +s «Ee, 
. I (azz; — b i) 


(3) fla: 


where n > m. We may select a set, g, , of m equations a;;z; = b;, and let 
A; = {a:;], By = [b,], C. = Az Be = [cx]. Then, we may carry out the integration 
as previously, in this case, summing a series of terms for various combinations 
of m equations out of the possible n. The value of the integral may then be 


written 


(4) (Qmi)” Do i ea 


| II (aie 15 Clk — b i) ; 

lAgk 

where the summation occurs over the points (cx, Cx, °°: , Cnt) lying in the 
region defined by S, and the product term excludes the set of equations ax. . The 
ambiguity of sign in (3) and (4) arises from the Jacobian | A; [and the sign 
must be chosen which makes the transformation of dz , --- , d@m yield a positive 








564 M. H. QUENOUILLE 


element. It must be noted that it is possible to obtain several expansions of the 
form (4) according to the convention that is employed in defining “enclosure” 
for each of the variables. 


5. Integral form for the joint distribution function. We can, without loss of 
generality, assume o = 1. Suppose that 


n n 2 n n 2 
= du «i = (= «:) /%, = De eveiss = (3 «) /», 


where € , @&, °-* , én are independent, so that r: = q:/p. Then by a consideration 
of n dimensional space, we can see that p is distributed independently of r,, -- + Tm 
so that their joint distribution can be written g(p)h(n , «++ ,Tm)dpdri,--:,drn. 
The joint distribution of p and q, --- , gm can thus be written 


(5) S(pqi +++ Im) dp dq +++ dgm = oP) 5 (2, tee tn) dp dq --+ dm, 


















where it is not difficult to see that 


(6) 


4(n—3) p—tp 
9(P) a 4(n—1) n—1\° 
. * ("5") 


We can now find the joint distribution of p and q, --- , dm by inverting the 
characteristic function of these variables. This is given by 


1 ” 2 ; 
(mr [. aie f ex| -= + i(np + a0) | rr 


= aor fo [exp | BE] dade, 


= 1/\A}, 









= la, €@ °°", én] 











n—l 


| A | = II (1 — 2in — 276; Kj1), Kjt = COS omit 
l=] 





’ 


270; b 
1 — 2in’ 





a—l 
= (1 = Qin)” Il (1 _ Kj Kji), i = 
j=1 
so that the joint distribution of p and q, --- , @mis 
1 - 1 , 
S(P, 1° + * Im) = (Qr)"4 [ —y | [ai exp {—7a(np + 4;9;)} dnd --- dOm 


- 1 [ ~in7 1 __ On panes -_ 
(7) = (Qr)™4i iy (1 2in) s, (ar 


: (i — 2indesai\ dk, +++ dkm 
exp { = 9 (Qin 








dn, 














SERIAL CORRELATION COEFFICIENTS 


= 00 

— 2in 

by region S enclosing the same set of aunties on the real hyperplane, and S 

can be chosen independent of y. Thus it will be possible to reverse the order of 
wo 


where S, is the region bounded by x; = + . Now S, can be replaced 


integration in (7) provided that [ | 1 — Qin ener a converges, i.e. provided 
n > 2m + 3. Then since 


oo 


] “4 (n—2m— e 
On (1 — in)? ” exp {—in(p — x;q;)} dn 
we 
s n—2m— Tr 


2 
0 for p< xq, 


we get 


ep 





S(p, Las Qn) = ts 
ghln— D (oq ” “ 


(p - at _ Kj Gi — 3) 


a 3dx1 +--+ dkm, 
"TT qG — 7) 


where S_ encloses the same singularities as S, all of which lie in the region 
p = «;q;. lf we now use (5) and (6) we get 


_{n—1 
ey ) 


h(ry Tas Tu) a = — 2m — ') 
| ie eae — 
(9) 


(1 — kj Sila 3) 
=n—1 : a dky +++ dkm. 
* (Qri el fj ‘ ; ; 

TI « (1 — xjKja) 

l=1 
In a similar manner, it is possible to derive for n > 2m + 3 the joint distribution 
of serial correlation coefficients, 7,, ---, 7m, uncorrected for the mean, in 
the form 


I] (1 — KjK;2) 


l=1 


. : (1 — k; 7;)} (n—2m—2) 
(10) h(h; oe Fm) — . = - i) a [- | he : ] dk, sis dkm ° 


6. Extension for variables in an autoregressive scheme. Madow [4] has shown 
how to extend the distribution of the serial correlation coefficient for uncorrelated 
variables to the case when the variables x; are connected by a linear Markoff 
scheme, %; = pXi-1 + €; with a normal distribution of the error e;. It is worth 





566 M. H. QUENOUILLE 


noting that the method used by Madow can be applied to derive the joint 
distribution of serial correlations of variables x; , which are connected by a linear 
autoregressive scheme of order m, or less, 


Anti + X31 + +++ + Onti-m = &, 


where «, ---, €, are normally and independently distributed, and €.4; = «;.’ 


Under these conditions, the expression (9) will be modified by a factor 
n iN 4(n—1) 
Di 
i=1 a 1 7 
v2 (A + 2B;r,)-0?” 
Ej | 


i=1 


(11) 


where 


A 


B; 


while (10) will be modified by a similar factor with n replacing n — 1. 


7. Reduction of the distribution function integral. Using the method described 
in section 4, it is now possible to reduce the integral given in (9), if we observe 
that x; = Kjn—. and assume n odd. We then have 


] (1 ate K;7;)8%—2m—9) 


h(r i 2 i “\m cee eee at dk adel dk, 
T1 ) ,{n — 2m — 1\ (272) Js $(m=1) K1 A 
7 s TT (= ajay 


| 


(12) LI |i) 


oy f a5 ~ 


_ 


ix, “jt = Ki | 


where J = (1, 1, ---, 1), 7’ = (m1, 12, +** 4 Tm)y Xj = (Kjty °°* 4 Kmi) and 
K, is the matrix formed from a set g, of the m matrices *;, arranged in order. 
The factors in the summation can most easily be determined if we put 

1 J 

r K, 
A(ri, +++ , %m—1). To demonstrate the manner in which formula (12) works, we 
shall consider m = 2. From formula (2) we can see that a limit to the possible 
values that 72 can take is given by r2 = 2r; — 1 i.e. by the curve (cos 6, cos 26) 


«x A(r,, -++, Tm-1) — 7m and sum over the region for which r, < 


1 This is a sufficient condition for t”4; = 2. 





SERIAL CORRELATION COEFFICIENTS 567 


in the (7: , 72) plane. It is not difficult to see that there are "C2 possible terms in 
(12) and that each of these terms is proportional to the 4(n — 2m — 3)th power 
of the distance from a line in the (7 , r2) planes. These lines are the joins of the 
points (cos 2mi/n, cos 4ai/n), 7 = 1, «++ , (nm — 1) and the joins of such points 
on the curve (cos 6, cos 20) give the outer limits of the possible values of r; and rz . 
It can also be seen that these points correspond to the equations xj«;, = 1 (each 
of these equations determines a plane in 4-dimensional complex space), while 
the joins of these points correspond to the singularities defined by and terms 
arising from pairs of these equations. Furthermore, since the sum of residues in 
any plane is zero, the sum of contributions, taken with appropriate signs, arising 
from lines through any of these points is zero, i.e. the sum of all possible terms 
involving any particular x; will disappear. This leads to several possible 
expansions for h(r; , «++ , 1m). 

If we consider the particular case n = 9, then each term in the expansion (12) 
is proportional to the distance from one of the lines joining (cos 2717/9 cos 47/9), 
i = 1, 2, 3, 4. These lines may be denoted by 1;;. Then the contribution from 
l;; is given by 


3 KiuikKiy — (xi + Kij)T1 + 3(re + 1) 


(kK; = Kia) (Ki — Kin) (Ki: — Kis) (Ki om? kik) (Ki; =— K1i) F 


2 
, , Qra 
where j > 7 and kia = cos ——. 


The values of this expression are: 
lie, — 1.979 + 2.938 r1 — 1.563 re , 
Ls, 0.926 — 2.106 7; + 3.959 r2, 
la, 1.053 — 0.832 r, — 2.396 re, 
log , — 5.012 — 3.959 7, — 6.065 72 , 
log, 3.033 + 6.897 ri + 4.502 re , 
lss, — 4.086 — 6.065 11 — 2.106 72 , 


where, for example, the contribution from lJ, acts in the region for which 
1.563 ro < — 1.979 + 2.938 r,. Fig. 2 demonstrates the configuration for this 
case. It is seen that the frequency surface is a tetrahedron. As particular ex- 
amples of the identities mentioned above we have 


lie + hs + ly = 0, 
—lyp + ss + loa = 0, 
—lys aa Ios + sa = 0. 


For a general value of m, we shall find that the hyperplanes joining sets of m 
points (cos 2ri/n, cos 4mi/n, --+ , cos 2mmi/n) will be singularities on the 








568 M. H. QUENOUILLE 


frequency hypersurface. The hyperplanes passing through sets of m successive 
points will give the limits of possible values of 71, --- , rm. Furthermore, the 
sum of contributions (with appropriate signs) to the frequency function from 
the set of 3(n — 2m + 1) hyperplanes passing through any point will be zero. 


8. Integral approximation for the distribution function. The expression (12) 


is, of course, difficult to use in practice and we require an approximation similar 
to that of Koopmans. For this we make use of the integral expression (10) 


| 
| 
| 


4053-08327 ~ 2-396r, 





Chan) \/979-2°9387 +1 563% 
| 


‘ 
| 
Ale rjJeFo2 
*3-959r, 
*6 0657, 





i Fig. 2 


for the joint distribution function of 7,, --- , 7m and approximate to the factor 
n ad 

| I (l—k x) | . This can be done without undue difficulty, but the resulting 
1=1 


multiple integral does not appear to be capable of easy reduction. This is hardly 
surprising, since from the nature of the distribution of the r; we should expect 
this approximation to involve R,, raised to a suitable power, and this conjecture 
is strengthened by the following considerations: 

a) The distribution of 7, may be obtained by considering the two sets of 
observations 2 , 72, °** ,%n-1,%n and 22,273, +++ , Xn, 7, as unrelated, and using 












SERIAL CORRELATION COEFFICIENTS 569 


the distribution of the ordinary correlation coefficient corresponding to n + 3 
pairs of observations. (Dixon [6] Quenouille [7]). In the same manner, the m sets 
of observations 2 , %2, °°* , Xn-1, Yn; T2, Tz, ***, Bay *** Bums Buti, , 
Im—2 ) Lm—-1 , Might be considered as unrelated and the joint distribution of their 
correlations, given by Garding (5), will involve R,, raised to a suitable power. 


b) The outer limits for the joint distribution of 7) , ro, +--+ , 7m OV 71, 72, °°* , Fm 
for large n, will be provided by the equations R, = 0, (p = 1, ---, m). An 
investigation of the properties of the functions, R, , R2, --- , Rm might therefore 


be expected to throw light upon the joint distribution of 7, 12 
a 

c) R, is a quadratic in r, and may be put equal to R,-2(rp — r,)(tp — To) 
where 7, and r, are functions of 7; , 72, «+: , Tp-1 giving the limits of the values 
that r, can take for any particular values of 1, --- , 771. Let Q, = R,/Rpu, 
then Q, is likewise a quadratic in r,, taking all values between r,, and 7, and 


5 22> 5 Te OF 


"® Ri" " 
[. Q> dr, = ® “(r', = re) Vy = Tr.) dr, 
Tp Rea rp 


B(s + 1, 3) r - i 
Q>-1 2 ° 


But, by expanding R, as a bordered determinant, it is not difficult to show that 
rp . r. _ 20-1; so that 


" 8 T(s + i 5 3 8 
[. Q, dr, = r(s se T° a. 


In particular, if 





"G cor 2 1 arn 
(13) f(r: ° ‘T.) = — rG red + te a — 7 ” 7 > ‘ ] §(n—2m+1 


r(an +2 Tan —m+ wmi2 e™ e 
and if we integrate with respect to 7» , %m-1, °** 72 In turn, we get 
er e? 
m T(an + 1) 2\3(n—1) 
| ae | f(n © eee Tm) Arm ae drs — = 1 (1 — p?y hn ’ 
rs x T(3n + 4)x! 


which is the approximate distribution of the first serial correlation coefficient, 
uncorrected for the mean, as given by Dixon [6]. 

The importance of this lies in the fact that the integral corresponding to that 
of Koopman’s for the joint distribution is 


11 4n—m—1 0 | 


(1 93" m 
Pn) 71 [- [oe : * —— [|] | sin 3nz; = *(z,) dx, +++ dim 


Ir'(3n — m) <1 dx; X | 








570 M. H. QUENOUILLE 


where r’ = [r,, --+ , Tm], 


COS 21 COS X “+2 COS Im 
x =| ©o8 2x; COS 2% +--+ COS 2rm 


| 1 1 1 | 
| 
Y= COS 21 COS 22 COS Lm 
| 3. (m — 1)a, cos (m — 1)2,2 cos (m — 1)ap, | 

«’(9) = [cos 6, cos 26, --- , cos mé], 


and S is the region given by | . : > 0. This suggests, by analogy, that the 
joint distribution function is a polynomial in r,, of degree 2(3n — m — 1) +3 = 
n — 2m + 1 which vanishes only when R,, = 0. The equation satisfies these 
conditions, and in addition, it reduces to the known form when m = 1 and can 
be integrated to give this same form. Thus there is a strong suggestion that (13) 
gives an approximate distribution of 7) , 72, --+ , 7m, uncorrected for the mean. 

An alternative form for the constant factor in (13) may be obtained if we 
note that 


TQGgn—m+2) _ 1 Tnm—2m+3) 
rian — m+ 3)rt 2h-2”*2 [PF (4n — m 4+ 8))?° 





d) Now r, and r, can be written in the forms (S,. + R,-1)/Rp-2 and 
(Sp-1 — Rp1)/Rp-2 , where 


rT} ie) r3 0 | 

1 ry To T p-1 
Y p—1 | 
Sp-1 = (—1)’ 1 1 rT Pl p—2 
Tp—2 p-3 Pp—4 1 | 


Thus 


a Rp E on & Ry-2 a Se) | 
Rp-2 Up-1 
Q> a Q,a(1 — Tint Pe 


/ 
where 11.9123... = Twi Ry ? 


SERIAL CORRELATION COEFFICIENTS d71 


and 
| lp T1 To To 
l p-1 1 Ti T p—2 | 
Tp. = | Tp2 M1 ] Tp | 
| 


| Ve to-8 Teg °** ] | 


Therefore, if we make a change of variable to 11,p-1.93... , Ti,p.23---5 °° * 713.2571, 
we find that the new variables which correspond exactly to partial correlation 
coefficients are, in fact, independently distributed as such, with 3 degrees of 
freedom more than in the case where the sets of variables are distinct observa- 
tions. 

While the above properties do not prove that the r; or 7; may be tested 
using partial or multiple correlation coefficients, this conjecture has been verified 
elsewhere and it has been shown [8] that, with certain adjustments, a test can be 
derived which is applicable to fairly short series. 


REFERENCES 

{1] R. L. ANpERson, ‘‘ Distribution of the serial correlation coefficient,’? Annals of Math. 
Stat., Vol. 13 (1942), pp. 1-18. 

[2] T. Koopmans, “‘Serial correlation and quadratic forms in normal variables,’’ Annals of 
Math. Stat., Vol. 13 (1943), pp. 14-33. 

[3] M.S. Bartwett, ‘‘On the theoretical specification of sampling properties of autocorre- 
lated time series,’’ Roy. Stat. Soc. Suppl., Vol. 8 (1946), pp. 27-41. 

[4] W. G. Mapnow, ‘‘ Note on the distribution of the serial correlation coefficient,’’ Annals 
of Math. Stat., Vol. 16 (1945), pp. 308-310. 

[5] L. Garpinc, Proceedings of Lund University Mathematical Seminars, Vol. 5, pp. 185-202. 

[6] W. J. Drxon, ‘‘Further contributions to the problem of serial correlation,’’ Annals of 
Math. Stat., Vol. 15 (1944), pp. 119-144. 

[7] M. H. QUENOUILLE, ‘‘Some results in the testing of the serial correlation coefficient,’’ 
Biometrika, Vol. 35 (1948), pp. 261-7. 

[8] M. H. QUENOUILLE, ‘‘ Approximate tests of correlation in time series 1,’’ Roy. Stat. Soc. 
Suppl., Vol. 11 (1949). 








ON THE ESTIMATION OF THE NUMBER OF CLASSES IN A 
POPULATION! 


By Leo A. GoopMAN 


Princeton University 


1. Summary. This paper deals with the following problem: Suppose a popula- 
tion of known size N is subdivided into an unknown number of mutually exclusive 
classes. It is assumed that the class in which an element is contained may be 
determined, but that the classes are not ordered. Let us draw a random sample 
of n elements without replacement from the population. The problem is to 
estimate the total number K of classes which subdivide the population on the 
basis of the sample results and our knowledge of the population size. 

There is exactly one real valued statistic S which is an unbiased estimate of K 
when the sample size » is not less than the maximum number gq of elements 
contained in any class. The restriction placed upon g is unimportant for many 
practical problems where either there is a reasonably low bound for q or those 
classes containing more than n elements are known. An unbiased estimate does 
not exist when there is no such knowledge. 

Since the unbiased estimate can be very unreasonable, modifications of S are 
considered. The statistic 


, N(n — 1) satis - 
Is’ = N — ist Xe, ii S’ > x Li, 
en) , ” 
1 aa n n 
12, Bi, if S’ <)>) x, 
\i=1 i=l 


where x; is the number of classes containing 7 elements in the sample, 
is the most suitable estimate, in comparison with three other statistics, for a 
hypothetical population. 

The case where each element in the population has an equal and independent 
chance of coming into the sample is used as a model for some sampling procedures 
and also as an approximation to the case of random sampling. 


2. Introduction. The problem discussed may be described in terms of colored 
balls in an urn. How should we estimate the number of colors present in the urn 
on the basis of both the sample which gives the number of, say, white balls, red 
balls, etc., and our knowledge of the total number of balls in the urn: 

The following practical cases illustrate some of the ways in which this problem 
presents itself: 

(1) A company has received a large number of requests for a free sample of 
its product. It is known that the same people often send more than one request. 

1 Prepared in connection with research sponsored by the Office of Naval Research. 


79 
lm 


a 


la- 
ive 
be 
ple 
to 
the 


i K 
nts 
any 
jose 
loes 


are 


Vi, 


ple, 
or a 


lent 


ures 


ored 
urn 
red 


ylem 


le of 
uest. 


ESTIMATION OF NUMBER OF CLASSES 373 


From a sample of the requests we wish to estimate how many different people 
have sent requests.” 

(2) The Social Security Board possesses a large collection of Social Security 
cards. It is known that some people obtain different cards when they change 
jobs. From a sample of the cards it is desired to estimate how many different 
people have Social Security cards.° 

(3) A person who sells durable commodities anticipates opening a store 
which is to be located at a highway intersection. He would like to know how 
many different automobiles pass through the intersection in a given time period. 
The total number of automobiles may be easily observed but some probably 
pass through more than once. This type of inquiry is also useful to advertising 
agencies which must decide the most efficient location for billboards. 

(4) The State Unemployment Compensation Board possesses a large list 
of the people receiving unemployment benefits. It is desired to estimate the 
total number of families benefiting from the insurance program on the basis of a 
random sample of the people named on the list. 

(5) The number of words in a book may be easily estimated and a sample can 
be taken. The problem of estimating the number of different words in a book is 
another analogue of the general problem.‘ 


3. Results and derivations. In order to show that an unbiased estimate of the 
number of classes in a population exists when the sample size n is not less than 
the maximum number g of elements contained in any class, we need prove the 
following two statements: 

Lemma 1. Suppose we have K classes of N similar elements with n; elements in 
class 1, nz elements in class 2, --+ , nx elements in class K. The class of an element 
is readily identifiable when the element is examined. Let 


q = max (n,). 


Suppose a random sample is drawn without replacement. If x; is the number of 
classes containing i elements in the sample, and K ; is the number of classes containing 
j elements in the population, then 


E(«) = > Prii|j, N, n)K;, 


where Pr(i|j, N, n) shall henceforth be an abbreviation of 


3 WyN-i 
C; Ci 
CN : 
2 Submitted by Charles Callard to Question and Answers, The American Statistician, 
Vol. 3, No. 1, p. 23. 


3 Mentioned to the author by Dr. J. Stevens Stock of Opinion Research Corporation. 
4 Mentioned in letter to the author from Frederick Mosteller of Harvard University. 








574 LEO A. GOODMAN 


ProoF. Let y, be the number of elements appearing in the sample from the s-th 
K 


class. The statement is proved by considering E(x;) = ie E(éi,,), where 
s=1 


(1, ify, = 2, 
diy, a 
0, if ys #2. 


Lemna 2. Let 


»  (aa—1)a@—2)---@—t+1), fort >, 
“ i , fort = 0. 

If 
WW — 0 +4 -1)" 


a;= ] a (—1) n@ 


’ 


then 
> Ai Pr(i|j,N,n) = 12 
This result follows directly from the fact that 
> (—1)°Ci[N —n +i — 1)%” = 0, forj 2 1. 


i=0 


The following theorem may be proved directly by the preceding lemmas: 
THEOREM 1. Suppose a sample of n elements is drawn without replacement from a 
population of size N which is subdivided into K classes. Let 


w-e+i- 





A; = 1 = (.. §)’ n® 


If there are x; classes containing 1 elements in the sample, then 


n 
E bs A; vs) = K, 
fa 
provided that n ts not less than the maximum number q of elements contained in 
any class in the population. 

TueoreM 2. There is at most one real valued statistic which is an unbiased 
estimate of the number of classes in a population.° 

Proor. Let us order the points of the sample space in the following manner: 
Letting z; be the number of classes containing 7 elements in the sample, order 
the sample points by increasing values of x, ; for equal values of x, , order the 
points by increasing values of x,-; ; for equal values of x,, , order the points 


5 The author is indebted to Professor Frederick F. Stephan of Princeton University for 
a statement leading to a simplification of the original result. 
6 This statement was mentioned to the author by M. P. Peisakoff of Princeton University. 





or 





ESTIMATION OF NUMBER OF CLASSES 575 


by increasing values of 2,-2 ; --- ; for equal values of x3, order the points by 
increasing x2. Let 


n 
a, = n — >, ja;. 


=e 
To prove the theorem, we must show that to each 0; there corresponds a 


unique value S(z), which must be the value of our estimate when 0; is observed, 
in order that the statistic be unbiased. To each 


0; — [x(z), ro(t), x(t), Pan a Ln(t)], 
jet us associate the population 


n 

P;= E = 2D, jxi(i), a2(t), X3(t) --> ei) | 
7=2 

If P, is the underlying population, then 0; for all z > 1 will occur with a proba- 

bility of zero. Since there are N classes in P; , the value of the statistic must be 

S(1) = N whenever 0; is observed in order that the estimate be unbiased. 

The theorem may now be proved by induction. 

Since all the P; used in the proof of Theorem 2 satisfied the condition that the 
maximum number g of elements contained in any class be not more than the 
sample size n, the statistic S is the only real valued statistic which is an unbiased 
estimate when g < n. 

When the restriction that g < n is removed, it is useless to search for an 
unbiased estimate since we have 

THEOREM 3. There does not exist an unbiased estimate of the number of classes 
subdividing a population when it is not known whether the maximum number q of 
elements contained in any class is not more than the sample size n. 

By the preceding theorems it is clear that if an unbiased estimate exists it 
must equal S. However, S is generally not unbiased when n < q. 

TuHEorEM 4. Suppose the statistics S; , S2, +--+ , S, are the solutions of the system 
of linear equations 

a; = >, Pr(i| j, N, n)S;, fori = 1,2,---,n, 
j=1 
where x; is the number of classes containing 7 elements in a sample of size n from a 
population of N elements. If K; is the number of classes containing j elements in the 
population, then E(S;) = K;, forj = 1, 2, --+ , m when n is not less than the 
maximum number q of elements contained in any class. 

Proor. We observe that the statement is certainly true forj = ¢ + 1,q + 2, 

- , n, since 

E(S;) = K;=0, for j=q+t+1,¢4+2,---,n. 
The statement is also true for 7 = q, since 
ina 


na) 


E(S,) — E(x) 


= Ky. 





576 LEO A. GOODMAN 


To prove that E(S;) = A;, for any 7 < g, we assume it to be true for all z > J, 
whereupon its truth for 7 follows. 
By Theorem 2, and 3, it is clear that >. S; = S. Since 
j=1 


q 


> jK; = N, 


j=1 
it seems reasonable to ask whether the values of the estimates S,, S2,---, Sy, 
are in agreement with the known value of the size of the population. The unbiased 
estimate of K can be shown to be internally consistent by 

THEOREM 5. Suppose a sample of size n is drawn without replacement from a 
population of N elements which is divided into classes. If x; is the nwmber of classes 
containing i elements in the sample, and if the linear equations 


t; = »~ Pr(z | h N, n)S; ? 


= 

are solved simultancously for S; , then 
2X, JS; = N. 
7=1 

The theorem follows readily from the fact that 


2 


a aPr(i|j,N,n) =n J and x 2; = N. 
4 


i=] i=] 


The variance of S may now be calculated by means of the formula 


n 


| (<a 
0, = Zz. A; A; Uij = a, AcAs) 2 m,t(t, pK, Kk; 


i,j=l 8, t=1 
q \ 
+ Do [m.(i, j) — m..(i, DIK», 
s=1 / 


where u;; is the covariance between x; and x; , ms:(z, 7) is the covariance between 
6;,, and 6;,, when r # h,n, = sand n, = t, and m,(i,7) is the covariance between 
6;,, and 6;,, when n, = s. 

Since the statistic S can be very unreasonable, we consider other possible 
estimates of K. The statistic 


S’ = N ——: 


may be shown to be a modification of S which replaces the number 2; of classes 

containing 7 > 2 elements in the sample by an additional ix; classes, each 

containing only one element. Since the values of AK; for 7 > 2 are relatively small 

in the practical problems of Section 2, S’ might be used as an estimate. 
Another statistic which may be used to estimate K is 


N< 
S” = = 2 Lye 


i=l 


J; 


en 
en 


ble 


ses 
ach 
rall 


ESTIMATION OF NUMBER OF CLASSES 577 


This statistic may be shown to overestimate K whenever q ~ 1. The estimate 


n 
Yad => +e MP 
i=l 
underestimates K when n < N — m where m is the least number of elements 
contained in any class. 


4. Binomial sampling. Let us suppose that each element from a population 
of N elements has an equal and independent chance p = 1/r of entering the 
sample s. In this case, the size of the sample obtained is a random variable 7 
which is binomially distributed with mean Np. If a large random sample of n 
elements is drawn without replacement from a large population of size N, then 
the results when interpreted in terms of binomial samples where p = 1/r = n/N 
are a good approximation to the results obtained by the usual model. Binomial 
sampling may be considered a model of the case where one attempts to obtain 
the sampling ratio p = 1/r by drawing simultaneously an uncounted sample of 
elements which is estimated as being of the appropriate size. 

In the case of binomial sampling, the statistic 


N 
B= > B;2x; where B; = 1—- (1 —71)' 
i=l 
may be shown to be an unbiased estimate of the number of classes in a population 
from which binomial samples are drawn. 
Let us now consider the statistic which corresponds to S’ for the case of 
binomial sampling; i.e., 


a” 
> 


- 2 
= A — rT. 
It may be shown that 
q 


E(B’) = K, + Kz: + QU [j — C20. — p)1K,. 


7 


w 


Hence, the statistic B’ will underestimate AK whenever 


9\1/3s-—2 
Pp < I 7 (7) ? for J — 3, 4, a Es 
9 1/j—2 
'-(@) 
J 


is a decreasing function of j for 7 > 2, when p > }, B’ overestimates, and when 


3\1/a-2 
p<1l- (2) ; 
Pp 


Since 








578 LEO A. GOODMAN 


B’ underestimates the value of K. When p is such that 


9 1/q—2 


the expected value of B’ is brought closer to K by underweighting some K; 
and overweighting others. 


5. A hypothetical population.’ Suppose we draw a random sample of 1000 
elements without replacement from a population of 10,000 elements where 


kK, = 9225, Kz = 336, Kz = 33, K, = ih. 


Hence, K = 9595. By means of Table 1, let us now compare on the basis of 
binomial sampling the estimates which have been presented in the preceding 
sections. Since N and n are large, these results are a good approximation to the 
case of random sampling without replacement. 








TABLE 1 
ines ai 
Estimate Expected value Bias 'WMean Square Error| 
S 9595 0 | 347 
S’ 9570 —25 | 207 
ad 9959 364 490 


o 996 


— 8599 8600 

It is clear that the best estimates of the number of classes in this particular 
population are S or S’, since S has the least bias, E(S) — K, and S’ has the 
least mean square error, E(S’ — K)°. One might argue that both S and S’ are 
the statistics which are capable of giving nonsensical estimates. However, we 
may decide to modify S or S’ in order to always get reasonable estimates by 
using the statistics 


S, fN>S> Dx, 
i=] 
T= N, if S > N, 
enti, £8 :< 2 
\ t=l1 =! 
( m 
Ss’, ie > Fs: 
T’ i=1 
28, if S’ < a 
. i=l i=1 


7 Other examples have been investigated by Frederick Mosteller in Questions and 
Answers, The American Statistician, Vol. 3, No. 3, p. 12. 


ng 
he 


lar 
the 
are 
we 
by 


and 


ESTIMATION OF NUMBER OF CLASSES 579 


Although these modified statistics T and T’ are not unbiased, they have the 
desirable property that 


MSE(T) < MSE(S), and MSE(T’) < MSE(S’). 


Since this hypothetical population is a plausible one for the practical problems 
of Section 2, the modified statistics 7 or T’ seem, therefore, to be “best” for 
estimating the number of classes for these problems, where the “‘best’”’ statistic 
is defined as the one which never gives unreasonable estimates and has the least 
mean square error. 

The author wishes to express his appreciation to Professor John W. Tukey 
whose suggestions were very helpful. 








CONCERNING COMPOUND RANDOMIZATION IN THE BINARY 
SYSTEM 


By Joun E. WaAtsH 


The Rand Corporation 


1. Summary. Let us consider a set of approximately random binary digits 
obtained by some experimental process. This paper outlines a method of com- 
pounding the digits of this set to obtain a smaller set of binary digits which is 
much more nearly random. The method presented has the property that the 
number of digits in the compounded set is a reasonably large fraction (say of the 
magnitude 4 or 4) of the original number of digits. 

If a set of very nearly random decimal digits is required, this can be obtained 
by first finding a set of very nearly random binary digits and then converting 
these digits to decimal digits. 

The concept of “maximum bias” is introduced to measure the degree of 
randomness of a set of digits. A small maximum bias shows that the set is very 
nearly random. 

The question of when a table of approximately random digits can be considered 
suitable for use as a random digit table is investigated. It is found that a table 
will be satisfactory for the usual types of situations to which a random digit 
table is applied if the reciprocal of the number of digits in the table is noticeably 
greater than the maximum bias of the table. 


2. Introduction and discussion. With the development of the theory of games 
and the more widespread use of experimental methods for determining approxi- 
mate distributions for statistics whose probability laws are difficult to obtain 
analytically, a demand for large sets of random digits has arisen. The problem of 
obtaining a set of digits which can be considered sufficiently random for the 
situations to which it would be applied, however, is not an easy one. One approach 
to this problem consists in obtaining a set of digits by some procedure and then 
applying tests to this set of digits to determine whether it can be considered 
satisfactory. Although appropriate choice of the tests may result in acceptance 
of sets of digits which are suitable for certain special types of situations, this 
approach is of a negative character and does not prove that a given set of digits 
is sufficiently random; it merely indicates that this may be the case. What is 
needed is a constructive approach to the problem, i.e., a method of constructing a 
set of random digits which can be proved sufficiently random for most applica- 
tions if certain intuitively acceptable conditions are satisfied. A step in this 
direction has already been taken by H. Burke Horton in [1] and by H. Burke 
Horton and R. Tynes Smith III in [2]. This paper presents what is hoped will be 
another step in this direction. 

In this paper, considerations will be limited to the case of binary digits. The 
reasons for this are twofold: 

580 











its 
N- 


he 
he 


ed 
ng 


red 
ble 
git 
bly 


nes 
)X1- 
ain 
1 of 
the 
ach 
hen 
red 
nce 
this 
gits 
t is 
ge a 
ica- 
this 
irke 
l be 


The 





JOHN E. WALSH 581 


(a). The method used for compounding the digits yields a sharp upper 
bound for the maximum bias of the compounded set (i.e., a bound that 
the maximum bias could actually attain) only for the case of binary digits. 

(b). Many of the experimental procedures for obtaining approximately 
random digits consist in first producing binary digits and then converting 
to another number base. Thus binary digits are produced directly. 
Hence, to use the results of this paper, the only modification required in 
these procedures would be to compound the binary digits before they 
are converted. 

Now let us consider some definitions: A set of random variables each of which 
can assume only the values 0 and 1 will be referred to as a set of binary digits. 
For convenience, each of the random variables making up a set of binary digits 
will be called a binary digit; this is not to be confused with the value obtained 
for the random variable. The absolute value of the deviation from } of the 
conditional probability that a specified binary digit has the value 0 (or 1) is 
called the bias of that digit for the given conditions on the remaining digits of 
the set. The maximum bias of a binary digit is defined to be the maximum of the 
biases of that digit with respect to all possible conditions on the remaining 
digits of the set. The maximum bias of the set is the greatest of the maximum 
biases of the digits of the set. A set of binary digits is said to be random if its 
maximum bias is zero. 

The method used to prove that a set of compounded digits has a sufficiently 
small maximum bias is somewhat similar to the situation encountered in mathe- 
matics where one begins with certain axioms and then draws conclusions. If the 
axioms are correct, the conclusions are necessarily valid. The first step in the 
compounding procedure consists in obtaining a set of binary digits by some 
experimental process (perhaps from a random digit machine which is based on 
some physical principle). The experimental process is so chosen that there is no 
doubt that the set of binary digits produced satisfies the two conditions: 

(i). The maximum bias of the set is less than or equal to some specified 
value a(<3). 

(ii). The digits of the set can be arranged in a specified array which has the 
property that the rows of the array are statistically independent. 

On the basis of these two assumptions (which play the same role as the axioms 
mentioned above), it can be proved that the maximum bias of the resulting 
compounded set of binary digits never exceeds a specified value which depends 
on a. Moreover, the upper bound for the maximum bias of the constructed set of 
binary digits can be made extremely small even for large values of a. 

If the experimental process is suitably chosen, conditions (i) and (ii) can be 
satisfied beyond any doubt. For example, let us consider 1000 people located in 
different parts of the world and not in contact with each other. Let each person 
flip an ordinary coin high in the air so that it will land on a flat hard surface, 
record the result (say 0 for a tail and 1 for a head), and then repeat this procedure 
until 5000 binary digits are obtained. If a is set equal to 3/10, condition (i) is 





582 COMPOUND RANDOMIZATION 


obviously satisfied for the resulting set of 5,000,000 binary digits. Condition (ii) 
evidently holds if the array is taken to consist of 1000 rows where each row 
contains 5000 binary digits obtained from one person. 

The ideal choice for a would be the actual maximum bias of the set of binary 
digits obtained from the experimental process. Then the compounding procedure 
for obtaining a set of digits with a specified upper bound for the maximum bias 
would be simplified; also the number of digits in the compounded set would be a 
larger fraction of the original number of digits. Invariably, however, the proper- 
ties of the experimental process are not known with sufficient accuracy for 
obtaining anything but a safe upper bound on the maximum bias of the set of 
digits produced. This situation is analogous to that of estimating the length 
of a stick which a very rough measurement has shown to be about 10” long. 
Although one might be very hesitant to believe that the length of the stick lies 
between 9.9” and 10.1’’, the contention that the length lies between 5” and 15’ 
can be accepted with virtual certainty and any logical conclusions based on this 
contention can also be accepted with virtual certainty. 

Given the number of binary digits in a set and the maximum bias of the set, 
is it possible to determine whether the set is suitable for use as a set of random 
binary digits? An important consideration in answering this question is the use 
that is to be made of the set of digits. This must always be taken into account 
before the suitability of the set can be decided. For example, if no more than 
1/1000 of the digits of the set are to be used for any particular situation, the 
set might be satisfactory for the types of cases to which it would be applied; 
on the other hand, the set might not be suitable for cases of these types if all the 
digits of the set are used for each situation. This example calls attention to 
an important point, namely that the suitability of a set of binary digits depends 
on the number of digits in the set. Let a set have a fixed non-zero maximum 
bias p. If the set contains a sufficiently large number N of digits, relations and 
expressions involving the digits of the set can be found whose probabilities, 
moments, etc., can differ greatly from the values which would be obtained if the 
relations were based on the same number of truly random binary digits. As a 
specific example consider the relation 


All the digits of the set have the value zero. 


If the reciprocal of the number of digits in the set is of the same order of magni- 
tude or smaller than the maximum bias of the set, the ratio of the probability 
of this expression to its hypothetical value can differ noticeably from unity. 
Thus, at least in certain special cases, a necessary condition for the suitability 
of a set of binary digits is that 1/N > > p. This condition, however, is also 
sufficient for most situations to which a set of random digits would be applied. 
The approximate sufficiency of the condition is a direct consequence of the fact 
that any set of N binary digits can be considered as a sample value from an 
N-dimensional population consisting of 2* discrete points. The 1/N > > p 
restriction implies that the probability concentrated at each of the 2” points is 





JOHN E. WALSH 583 


very nearly equal to the hypothetical value of (3)” for all possible conditions 
on the remaining digits of the set. 
The 1/N > > p condition is very satisfactory from the viewpoint of proba- 
bilities. The probability of any relation based on a subset of the digits of the set 
(possibly conditioned on other digits from the table) can be interpreted as the 
sum of the probabilities of those points included in a certain region (defined by 
" the relation) of the N-dimensional probability space of the set of digits. By 
expanding (5 + p)” it can be shown that the ratio of the probability of any 
relation based on one or more digits from the set to the corresponding value for a 
f truly random set of digits will be very nearly equal to unity if 1/N > > p. 
d It is evident that the higher order moments of an expression based on one or 
more digits of the set can differ noticeably from its hypothetical value even if 


— 1/N > > p; any deviation from the ideal situation, no matter how small, can 

. become important for high order moments. For the first few moments, however, 

S deviations from the hypothetical values are not appreciable since these moments 
are based on the probabilities at the 2" points in the N-dimensional probability 

be | space and these probabilities are very nearly equal to the hypothetical value of 

n (4)” in all cases. 

e The above discussion shows that the values of N and p are sufficient to deter- 

it mine whether a set of binary digits is suitable for use as random binary digits 

n for a wide variety of situations. Analogous considerations apply for digits to any 

1e number base. 

i: A magnitude definition of the relation 1/N > > p is difficult to specify. If p 

1e is the upper bound for the maximum bias of a set of digits obtained by the 

LO compounding procedure outlined in this paper, however, it seems that a reason- 

1s able condition would be that 1/N > 50 p. This condition implies that the 

m probability of any relation based on digits of the set can not differ from its 

id hypothetical value by more than approximately 4%. In most practical 

S, applications the value obtained for p would be noticeably greater than the 

he true value of the maximum bias of the compounded set. 

a Since the maximum number of digits which can be taken from a table is the 
total number of digits in the table, the above considerations suggest that a 
random digit table should be constructed so that the reciprocal of the number of 
digits in the table is noticeably greater than the maximum bias of the table. 

ai. Any table having this property would be satisfactory for most situations to 

ity which it would be applied. . 

ty. Now let us consider two different compounding methods which produce sets 

ity of binary digits with the same upper bound for the maximum bias. If the com- 

Iso putational difficulties of applying the two methods are of comparable magnitudes, 

ed. it seems reasonable to prefer the method which yields the larger set of digits. 

net For example, if the number of digits in the set obtained by the first method is 
an only 1/8 of the original number of digits while the number in the set obtained 

- by the second method is 1/3 of the original number, the second method would 


seem preferable even if it required as much as 100% more computation. 








584 COMPOUND RANDOMIZATION 


The compounding method presented in this paper has the property that the 
number of digits in the compounded set can be held to a reasonably large fraction 
of the original number of digits at the same time that the upper bound for the 
maximum bias is made extremely small. The method presented by Horton in [1] 
does not have this property. For example, let a = 1/10. Applying Horton’s 
method, when the compounded set consists of 1/8 of the original number of 
digits the upper bound for the maximum bias is 12.8 xX 10’. The example 
presented in section 3, however, shows that a compounded set whose number of 
digits equals 1/3 of the original number and which has an upper limit of 11.7 X 
10~’ for the maximum bias can be obtained using the method presented in the 
next section. 

Although the compounding method outlined in section 3 is presented as a 
series of steps, the value of a digit of the compounded set can be written as a 
linear function (mod 2) of digits of the original set. This was not done in what 
follows because of the complicated nature of the general form of such expressions. 
In any particular case, however, these expressions can be written without much 
trouble and the compounded digits computed from the original digits in a 
single step. 


3. Outline of compounding method and statement of theorems. This section 
contains a description of the compounding method mentioned in the preceding 
two sections as well as statements of the basic theorems concerning this com- 
pounding method. Proofs of the results stated in this section are given in section 4. 

Let us consider the array of mn binary digits 


Vir, Vi2, *°* » Lin 

tm 5 Tay “*°* » Te 
(1) 

Umi» Um25 °** » Umn 


which satisfies conditions (i) and (ii); i.e., the maximum bias of the set (1) is 
less than or equal to a while a digit x., is independent of a digit z,, if r ¥ u 
(if r = u, however, x.» is not necessarily independent of z,,). 

Let a new set of (m — 1)n bitiary digits 
(2) Yij, (@Q=1,-+-,m—1;j = 1,---,n) 
be formed as follows: 

Yij = Lmj + 2:3 (mod 2), 
@=1,---,m—1;j7 = 1,°---,n). 

Then the biases of the y;; have the properties 

THEOREM |. Let U be a specified set of t — 1 of yz, +++ 5 Yo—ns, YEG, °° > 
Ym-1)3, (1 < t < m — 1), while V is a specified set of zero or more of the ypq’s 


i ad 


JOHN E. WALSH 585 


with q # j. Also let 6 consist of the set of integers such that p € 6 if yp; ¢ U. Then, 
if Yu = maximum bias for the set tur, +++ , Lun, (U = 1, +++, n), 


|Priys = 0|U,V) —3| <7 [1 -IG — ¥«)/(3 + %)] 


/{l + IlG — v)/(4 + %)] 


for all possible selections of U, V and of the values for the digits of these sets. 
Corouuary 1. [f exactly t — 1 of ys, °°» You-s, Yass, *°* » Yom—1yj have 
known values, the maximum bias of the binary digit y;; is less than or equal to 
afl — (3 — @)'/(3 + o) ‘Vfl + (3 — @)‘/(§ + a)‘. 
Coro.uary 2. The maximum bias of the set (2) is less than or equal to 
afl — (§ — a)” "/(3 + a)” )/[L + (3 — a)" "/(3 + a)”. 

The basic operation in the method of compounding binary digits is outlined 
in the procedure given for obtaining the y;; from the z,,. Let m = (1 + 4) --- 
(1 + tx). Then a set of t; --- ten binary digits can be obtained from the original 
set of mn digits x,, by continually applying this basic procedure. The first step 
consists in dividing the rows of (1) into (1 + #) --- (1 + tx) sets each consisting 
of (1 + &) rows in some specified fashion. Each of these sets is an array of 
(1 + t,) X n binary digits for which the rows are independent. Apply the method 
used to obtain the y;; from the x1, to each (1 + 4) X n array separately. Then 
each array yields a set of tn binary digits and there are (1 + tf) --- (1 + tr) 
such sets. In each set arrange the tn digits into a single row in some specified 
manner. This furnishes a new array of [(1 + &) --- (1 + tx)] X [hn] binary 
digits for which the rows are independent. Repeat this procedure with respect to 
t. thus obtaining a new array of [(1 + ¢;) --- (1 + tx)] X [hten] binary digits for 
which the rows are independent; etc., until a (1 + tc) X (i +--+ tei) binary 
digit array for which the rows are independent is obtained. Then form a set of 
binary digits Yo,, (g = 1,--+,te;h = 1,---,t +++ ten), from this array in 
exactly the same manner that the y;; were obtained from the z,,. Then the 
biases of the Y,, have the properties 

THEOREM 2. Let 8 , 8: , --- , Bx be defined by By = a and 


Bw = Bwall — (3 — Bw-r)™/(3 + Bor) I/[1 + (3 — Bu)™/($ + But)", 
(w = 1,---,K). 

Then, if exactly t — lof Yin, +++ , Voorn, Yootwn, +++ » Yen have known values, 
(1 <t < tx), the maximum bias of the digit Y 4, is less than or equal to 

Bxall — (3 — Bra)'/(3 + Bxa)'V/[L + (3 — Bea)'/(E + Bea)’ 
In particular, the maximum bias of the entire set of Yo, is less than or equal to 
Bx. Also 

Ball — (4 — Bx-)'/(} + Bx-a)‘V/[l + (3 — Bxa)'/(2 + Bra)’ 
(3) ef” .¢:€evGags+ ff" -f"- ae. 


1 








586 COMPOUND RANDOMIZATION 


The inequality (3) is frequently useful from a computational viewpoint. 
Although the right hand side of (3) is usually noticeably greater than the left 
hand side, in many cases this rough upper bound is itself small enough to show 
that the upper bound for the maximum bias is of the desired order of magnitude. 

If the set of compounded digits is to be used for a random binary digit table, 
Theorem 2 shows that advantage can be taken of the position of the digits in the 
table. Let M = ¢t, --- tx_yn and enter the values of the Y,,, (g = 1, ---, tx; 
h = 1, --- , M), into the table in the order 


Yn, Yi,-°+:, Yis, You, -++ , Yow, Yu, o>, Faas woes Teews 


Then, if a set of digits is taken from this table in consecutive order (Y,, follows 
Y:,m), the upper bound for the maximum bias of this set is dependent on the 
number L of digits in the set. From Theorem 2, the maximum bias of a set of L 
digits taken in consecutive order from a table formed in this manner is less 
than or equal to 


Beall — (3 — Bx-)‘/(3 + Bxa)'‘)/ll + (3 — Bra)'/(3 + Bra)’ 

for values of L such that (¢ — 1)M < L < tM, where 1 < t < tg. Thus, ifa 
small set of digits is taken from this table in consecutive order, the upper bound 
for the maximum bias of this set will usually be noticeably smaller than the 
upper bound for the maximum bias of the table. Since many uses of a random 
digit table require only a small fraction of the total number of entries in this 
table, this property would seem to be desirable. It should be emphasized, how- 
ever, that the maximum bias of a set taken from this table is always less than 
or equal to @x irrespective of the positions that the digits of the sets occupy in 
the table. Thus nothing is lost by constructing the table in this manner but 
something can be gained for small sets if the digits are taken from the table in 
consecutive order. 

Now let us consider situations in which it is required that the number of 
digits in the compounded set is at least a specified fraction, say 1/C, of the 
original number mn of binary digits. This requires that K and ¢t,, --- , lx be 
chosen so that 


bees te/(L +h) --- A+ tx) > IC. 


Also, for given values of K and C, it seems preferable to choose ¢; , --- , tg so that 
the value of 8x is at least approximately minimized. Examination of the results of 
Theorem 2 indicates that a reasonable method of determining the values of 
t, , --: , tx with this in mind consists in first choosing ¢; as small as possible, then 
(given the value of ¢; equal to its minimum value) choosing f. as small as possible, 
etc. This method is also recommended by the fact that the resulting values of 
t, , --: , tx are readily determined. The explicit procedure for finding tf, , --- , éx 
is given by 

THEOREM 3. Let the values of the integer K and the constant C (> 1) be given and 
consider the integers t, , «++ , tx subject to the condition 


ty: te/(L +t) +--+ (lL + tx) > I/C. 


q,h — 4 
Peo a 


be 


iat 
; of 

of 
1en 
le, 
: of 


und 


JOHN E. WALSH 587 


The minimum value of t, is the smallest integer satisfying 
i, > 1/(C — 1). 


In general, 2 < w < K — 1, having already determined t,, «++ , tw as their 
minimum values, the value of ty is the smallest integer satisfying 


be > 1/[Ct, as tywa/(1 + t;) oo (1 + baa) = 1). 


Finally, given t, --+ , tx as their minimum values, the minimum value of tx 
is the smallest integer satisfying 


te = 1/[Ch +++ tei/(l + hh) +++ (1 + teu) — 1). 


Now consider the general situation encountered in the application of the 
compounding process outlined above. Here the values of a, C are given and it is 
required to choose K and ¢,, --- , tx so that the upper bound for the maximum 
bias of the compounded set of ¢; --- tg binary digits Y,, is less than or equal toa 
specified value b. The following procedure furnishes a method of solving this 
problem: 

Let K = 1, obtain ¢, according to Theorem 3, and then compute §, . If 8; < b,a 
solution has been obtained. If 6: > b, let K = 2 and repeat the procedure to 
obtain 82. If B2 < b, the values of #,, t2 and K = 2 are a solution. If 8. > b, 
repeat the procedure for K = 3; ete. In practical situations, the value of K is 
usually bounded (e.g., by independence properties of the original set of digits). 
If 8x is still greater than b for the maximum permissible value of K, no solution is 
obtained. This means that either b must be increased or 1/C decreased or both 
if a solution is to be found. In many cases, a large amount of computation can be 
avoided by using the inequality (3). For marginal situations, however, a solution 
may be missed by using (3) instead of computing 8x . 

Example of method. The following table represents an example of application 
of the above method: 








a = 1/10 1/C = 1/3 b = 2 xX 10° 
f=1, h=1 6 =2xX 10° 
K=2, t=1, h=2 Bo < 1.6 X 10° 
K=3, h=1t=3, 4 =9 B; < 1.04 X 10° 
K=4, h=1, &#=3, t=10. t& = 44 Bs, < 1.17 X 10°. 


Thus K = 4,4 = 1, & = 3, t = 10, & = 44 is a solution. 


4. Derivations. The purpose of this section is to furnish proofs of the results 
stated in the preceding sections. 

4.1 Proof of Theorem 1. Let us consider the conditional probability that an 
arbitrary but fixed y;; has a specified value when the values of a fixed subset of 
zero or more of the remaining y’s are known. For convenience, assume that yy 
is the binary digit considered and that the values of ya, yn, °** , Ya (where ¢ 
is a fixed integer such that 1 < t < m — 1) and a set S are given while the 








588 COMPOUND RANDOMIZATION 


values of the remaining y’s are unknown. Here S represents an arbitrary but 
fixed set of zero or more of the y;;’s for which 7 > 2 while ¢ = 1 has the inter- 
pretation that none of the yi, (¢ > 2), are given. Let 


Pr(tm = 0|S) =4+ a4. and Priam =b|S)=}+ a, 


Then, using the independence conditions satisfied by the z’s, 


Pr(yn = b | Yor a be eit) be; S) 


[e+ +1¢-|/[Ile +a) + 1a - a | 


k 


t+1 t+1 t+1 t+1 
= i+) 116+) -THa-a)|/[Ma+a) +116 - a | 


k= 


= 3+ m6. 
Now |6| = (1 — P)/(1 + P) if 0 < P < 1 and equals (P — 1)/(1 + P) if 
P > 1, where P = II (5 — ax)/(3 + ax). Let y, be the maximum bias for the 
set of binary digits ai -++ Lun, (u = 1, --+ , m). Then it is easily seen that 


mx loi <[1-1a-w/at+m]/[1+ a -we+w], 


Thus 


| Pry = by | you = be , oo: Se bz ; S) = 3 | 
t+1 t+1 
<nf1-Ta-we+w]/[1+ 6-watw| 
for all possible selections of b, , --- , b: and all possible selections of S and the 


values for the digits of S. It is to be observed that this inequality is valid for ¢ = 1. 
Evidently this result can be modified to apply to an arbitrary y;; for which 
t— Lof yj, --+ , Ying, Yous °°* > Yomi have given values. This obvious 
modification results in Theorem 1. 
4.2 Proof of Theorem 2. By Corollary 2, the maximum bias of the [(1 + #) --- 
(1 + tx)] X [t:n] array is less than or equal to 6,. In general, 2 < w < K, by 


Corollary 2 the maximum bias of the [(1 + tw41) «+: (1 + tx)] X [t: +++ tun] 
array is less than or equal to 8, . Finally, by Corollary 1, if exactly ¢ — 1 of 
Yur, ***: 5 Yo-wn, Ywsyn, *** >» YVexn have known values, (1 < ¢ < ¢x), the 


maximum bias for the binary digit Y,, is less than or equal to 


Beall — (3 — Bx-1)‘/(8 + Bea)‘V/(1 + (4 — Br-s)‘/(4 + Br-1) ‘I. 


he 
ch 
us 
by 
val 


he 


JOHN E. WALSH 589 


The inequality (3) is an immediate consequence of the relation 
all — (3 — a)'/(3 + @)"}/[L + (8 — @)’/(3 + a)*] < sa’. 
4.3 Proof of Theorem 3. From the given condition 
te > I/[Ct +++ tea/( + hh) +++ (1 + teu) — 1). 
From this inequality for tx it follows that 
Ch +++ tea/(A +h) +--+ A + ten) -—1> 0. 
Thus 
try > 1/[Ch -++ teo/(1 + hh) +++ (1 — tee) — 1). 
In general, 3 < w < K — 1, given 
tw > 1/[Ct +++ towu/(L + ty) ++ (1 + tun) — 1) 
it’follows that 
Ch +++ twu/(1 +h) +++ (lL + tw) —1>0 
whence 
twa > 1/[Ch +++ to-e/(L + hh) +++ (1 + tue) — 1). 
Finally 
i, > 1/(C — 1). 


REFERENCES 
[1] H. Burxe Horton, ‘‘A method for obtaining random numbers,’’ Annals of Math. Stat., 
Vol. 19 (1948), pp. 81-85. 
(2) H. Burke Horton anv R. Tynes Situ III, ‘‘A direct method for producing random 
digits in any number system,’’ Annals of Math. Stat., Vol. 20 (1949), pp. 82-90. 








THE DISTRIBUTION OF EXTREME VALUES IN SAMPLES WHOSE 
MEMBERS ARE SUBJECT TO A MARKOFF CHAIN CONDITION 


By BENJAMIN EPSTEIN 


Department of Mathematics, Wayne University 


1. Introduction. The extreme value problem as treated in the literature 
concerns itself with the following question: To find the distribution of the 
smallest, largest, or more generally the vth largest, or vth smallest values in 
random samples of size n, drawn from a distribution whose probability law is 
given by the d.f. F(x). In this formulation the observed sample values 1, --+ , Xn 
are assumed to be statistically independent. While the assumption of inde- 
pendence may be a good approximation to the true state of affairs in some 
cases, there are situations where this assumption is not justified. 

Suppose, for instance, that the observations in the sample are ordered in time. 
Then it may happen that successive observations are stochastically dependent, 
the extent of this dependence being a function of the time interval separating 
these observations.’ In such cases the present distribution theory for extreme 
values in samples of size n is inadequate and must be replaced by more general 
results. 

It is clear that a clean-cut analytic solution to the problem of the distribution 
of extreme values in samples whose members may be stochastically dependent 
can be expected only for certain special kinds of dependence among successive 
observations. We are able, in this paper, to obtain the distribution of smallest, 
largest, second smallest, and second largest values in samples of size n drawn at 
equally spaced time intervals from a stationary Markoff process. 


2. The distribution of smallest and largest values in samples of size n drawn 
at equally spaced time intervals from a stationary Markoff process. In this 
section the following assumption is made: 

(A) observations 7, %, -**,2%n, °** are taken in order at times ¢ = 1, 
t= 2,---,t=n,--- froma stationary Markoff random process. 

The only information needed in the investigation of a stationary Markoff 
process at integral values of time is the function 


(1) F(x, y) = Prob (a; < 2, Tiz1 < y), 


independently of 7, where F(x, y) must be such that the marginal distribution 
obtained by integrating over x or y (if x; or x;4; take on a continuous range of 


1 If the observations 71 , 22, °*: ,2,, °°: are taken at discrete times t; , f2, °°: ,tn,-*" 
a measure of stochastic dependence between z; and z; is the ordinary coefficient of correla- 
tion ri; . If the observations are taken from a continuous stochastic process a natural 
measure of stochastic dependence between observations made at two different times is the 
covariance function of the process. In this paper we shal! limit ourselves to processes which 
are discrete in time. 


590 


\v 


\Y VS 


— 


cr ~w 


EXTREME VALUES 591 


values) or summing over the possible values of 2; or x;4,; (if x; and 2,4: can take 
on only discrete values) is of the form 


(2) F\(x) = Prob (x; < 2), 


independently of 7. 

An example of a random process meeting condition A is furnished by the 
Ornstein-Uhlenbeck process [1; 2]. In this case the joint df. of x; and 2,4: is 
given by a non-singular bivariate Gaussian distribution. The results in the 
present paper are stated completely in terms of the df.’s F2(x, y) and F(z) 
defining the stationary Markoff process and will in particular be valid for observa- 
tions taken at uniformly spaced time intervals from an Ornstein-Uhlenbeck 
process. 

In this section we shall find the distribution of smallest and largest values in 
samples x1, 42, ‘-* , Y, drawn from a random process under assumption A and 
specified by the bivariate d.f. F2(x, y) and the associated one dimensional 
marginal d.f. Fi(z). We first prove Theorem I. 

THEOREM I. Under assumption A, the distribution of largest values in samples of 
size n is given by the df. GS? (x) = [F2(x, x)\""/[Fi(a)]"~. 

To prove this result we note that G{”(x), the probability that the largest 
value in samples of size n is <2, is given by 
(3) Gr (2) = Prob (a < 2,22 < 2, +++, a, < 2). 

To evaluate the right-hand side of (3) we proceed as follows: 
(4) Prob (41 < 4,22 S 2, -++, 2%, Sz) = 


Prob (a1 < 2, %2 <2, °*+ ,%n-1 < X) Prob (ta S e| 41 SZ, +++, Mn SF 2). 


But under assumption A, (4) becomes 
(5) Prob (a1 < 2, % <2,°++,%n <2) = 
Prob (x1 < 2, 22 < 2, +++, %n-1 < 2) Prob (%, < 2| tn-1 < 2) 
or 
(5) GO (2) = G2(x) Prob (ta < 2 | tna < 2). 


But according to assumption A, and (1) and (2) 


(6) Prob (2, < x| 2-1 < x) = Prob (te-1 < 2, Xn < 7)/Prob (ta-1 < 7) 


F,(z, x)/F (x). 


Therefore 

(7) Go? () = GE2i(x) Fo(x, 2)/F (2x) 
= Gi’ (x) (Fo(a, x))""/(Fi(a))"™ 
= (F.(z, x))""/(Fi(a))"~. 








592 BENJAMIN EPSTEIN 






This proves Theorem I. 
For n = 1, 2, and 3 respectively one gets 


(8) G(x) = Fiz), G:?(@) = Fi(x, x), Gs? (x) = (F2(z, 2))’/Fi(a). 
THEOREM II. Under assumption A, the distribution of smallest values in samples 


of size n is given by the df. 


CU 
(9) H,,’(z) = 1 — eS. ae 


To prove this result we first note that H‘, (x), the probability that the smallest 
value in samples of size n be <z is given by, 


1 = Prob (oy > 2, te > 2, *** 5 Su > B). 
To evaluate HS” (x) we proceed as follows: 


(10) Prob (41 > 4,42 > 2, °+*,%n > 2) = 





Prob (a; > 2, %: > %, -** ,%e-a > &) Prob (a, > 2 | 2 > Z, °°* » Zana > FB). 
But under assumption A, (10) becomes 


(11) 









Prob (4%, > 4%, %2 > 2%, °°: ,%n > 2) = 











Prob (a; > x, %2 > 2, +++ , Za-1 > Z)-Prob (x, > 2 | Za > 2). 
But 


(12) Prob (x. > 2|2.-1 > x) = Prob (%.-1 > 2, 22 > &)/Prob (z,.1 > 2). 





To evaluate Prob (7,-1 > x, 2, > x) we note that 









(13) Peeb (2... > z, c. > £) + Prob (2. < 2, 2. > 2) 


+ Prob (x,-1 > 2,22 < x) + Prob (4-1 < 7%, 2n S 2) = 1. 









Prob (tn-1 < 2%, tn > &) + Prob (a, 


< S 2,20 S 
= Prob (7,-: 








< 








(15) 


A 


Prob (z..1 > 2, Za S £) + Prob (z,-2 S 2, Za S& 2) 


= Prob (7, < 2). 







Recalling that 
(16) F(x, x) 


and 


Prob (Sant < x; In < x) 


i 








(17) F\(x) = Prob (an-1 < 2) = Prob (2, < 2) 

















EXTREME VALUES 


we get 
(18) Prob (a1 > &, tn > x) = 1 — 2Fi(x) + F(z, x). 
Therefore (10) becomes 
(19) Prob (m > 2,22 > 2, °°: ,%-1>2,%4n > 2) = 
Prob (41 > 7, %2 > @, +++, na > x)[1 — 2Fi(x) + F(x, x)]/(1 — Fi(x)). 
Applying the recursion formula (19) successively we obtain 
(20) Prob (4 > 2,%2 > 2, +++ ,%n, > 2) = 
Prob (a, > x)[1 — 2Fi(xz) + F.(2, 2)\""/[1 — Fi(x)|"™ 
= (1 — 2 Fi(x) + F2(z, x)]""/[1 — Fi(x)]””’. 
Therefore H‘(x), the probability that the smallest value in samples of size n 
is <2, is given by: 


HO(2) = 1 — = 2Fie) + Fae, 2) 


1 — F,(x)|"? 
This completes the proof of Theorem II. 


In particular for n = 1, 2, and 3 respectively the d.f.’s of the smallest value in 
samples of size n are given by: 


Hi(x) = F(z), Hs” (x) = 2Fi(x) — F(z, 2), 


(22) H(z) i es {1 — = a x)) | 


3. Distribution of the second largest and second smallest values in samples 
of size » drawn at equally spaced time intervals from a stationary Markoff 
process. Under assumption A of Section II we can state the following theorem. 

THEOREM III. Under assumption A the distribution of second largest values in 
samples of sizen, n > 2, is given by the df. G(x), 


Gn’ (2) = [F.(z, x)]""/[Fi(x))"~ 
+ 2[F.(a, x)" (Fiz) — F.(x, x)}/{Fi(x))"~ 
+ (n — 2) [Fi(z, x)]"~ {Fi(z) — Fa(x, 2)}"/[Fi(z)!""( — Fiz). 


To prove this result we first note that G(x), the probability that the second 
largest value is < 2, is given by 


G(x) = Prob (m < 2, 22 
+ Prob (2 
+ Prob (a 
+ Prob (x 
+ Prob (a 


"++ » Ze S &) 


_% 4% *** ae SB) 


8 & 


L,%3 4, Xe SX, +++, tn ST) + --: 


& 


* 5 Un-2 < XL, Ln-1 > XL, Xn < x) 


IA IA VIA IA 


‘ 
oS 
- 


*~ tus S &, fa > SD. 








594 BENJAMIN EPSTEIN 


According to Theorem I 
(24) Prob (a1 < 2, t2 < 2, °°* ,2n 


lA 


x) = [F2(x, x)|""/{Fi(z)]"~. 
It can readily be shown that 
Prob (a1 > 2, t < 2%, %3 < 2, +++, In S 2) 
(25) = Prob (7: < 2, %: S 2, *** » Sa-1 S 2, a > 2) 
= [F.(x, x)|"* {Fi(x) — Fo(a, x)}/{Fi(x)]"~. 


It can also be shown that each of the remaining (n — 2) terms on the right-hand 
side of (23) is equal to 


(26) [Fo(x, x)}"* {Fi(z) — F2(2, x)}?/[Fi(x)]" 1. — Fi(z)). 


Combining (23), (24), (25), and (26) we get the desired result in Theorem 
III, i.e., 


Gn’ (x) = [F2(x, x)|""/[Fi@)]"~ 
(27) + 2[F.(x, x)|"* {Fi@t) — F(x, x)}/[Fi(z)]"~ 
+ (n — 2)[Fo(x, x)]"~* {Fi(x) — Fe(a, 2)}*/[Fi(x)|"(L — Fi(2)). 


In a similar way one can prove Theorem IV. 
THEOREM IV. Under assumption A, the distribution of second smallest values in 
samples of size n,n > 2, is given by the df. HY (zx). 


(1 — 2Fi(x) + F2(z, x)" 
1 - A@) 


_ [1 = 2F (x) + F(a, x))"~ 7 _ FP, 
(28) 2 —— tae {F\(x) — F(x, x)} 


~ (mn — 9) = 22) + Fe, 2)" (Fa) — Fala, 2) 
| [1 — F,(x)|>- F(z) ; 


H? (xz) =1- 





REFERENCES 
[1] J. L. Doon, ‘‘The brownian movement and stochastic equations,’’ Annals of Mathe- 
matics, Vol. 43 (1942), pp. 351. 


[2] M. C. Wana anp G. E. Untenseck, ‘“‘On the theory of the brownian motion II,’”’ Reviews 
of Modern Physics, Vol. 17 (1945), p. 323. 


NOTES 


This section is devoted to brief research and expository articles and other short items. 


(ere ne EI i 


NOTE ON THE CONSISTENCY OF THE MAXIMUM LIKELIHOOD 
ESTIMATE! 


By ABRAHAM WALD 


Columbia University 


1. Introduction. The problem of consistency of the maximum likelihood 
estimate has been treated in the literature by several authors (see, for example, 
Doob [1]’ and Cramér [2]°). The purpose of this note is to give another proof of the 
consistency of the maximum likelihood estimate which may be of interest because 
of its relative simplicity and because of the easy verifiability of the underlying 
assumptions. The present proof has some common features with that given by 
Doob, insofar that both proofs make no differentiability assumptions (thus, not 
even the existence of the likelihood equation is postulated) and both are based 
on the strong law of large numbers and an inequality involving the log of a 
random variable. The assumptions in the present note are stronger in some 
respects than those made by Doob, but also the results obtained here are stronger. 
For the sake of simplicity, the author did not attempt to give the most general 
results or to weaken the underlying assumptions as much as possible. Remarks 
on possible generalizations are made in Section 4. 

Let X,, X2, ---, etc. be independently and identically distributed chance 
variables. The most frequently considered case in the literature is that where 
the common distribution is known, except for the values of a finite number of 


1 The author wishes to thank J. L. Doob for several comments and suggestions he made 
in connection with this note. 

2 According to a communication from Doob, his Theorem 4 is incorrect, but is correct if 
the class of almost everywhere continuous functions in that theorem is replaced by asuitable 
class C of functions. The class C can be any one of a variety of classes; for example, the class 
of bounded almost everywhere continuous functions, or the larger class of almost every- 
where continuous functions each of which is less than or equal in modulus to any one of a 
prescribed sequence of functions with finite expectations. His Theorem 5 on the consistency 
of the maximum likelihood is then dependent on the class C used in Theorem 4. 

3 The proof given by Cramér [2], pp. 500-504, establishes the consistency of some root 
of the likelihood equation but not necessarily that of the maximum likelihood estimate 
when the likelihood equation has several roots. Recently, Huzurbazar [3] showed that 
under certain regularity conditions the likelihood equation has at most one consistent 
solution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative maxima, 
Cramér’s and Huzurbazar’s results taken together still do not imply that a solution of the 


likelihood equation which makes the likelihood function an absolute maximum is necessarily 
consistent. 


595 











596 ABRAHAM WALD 





parameters, 6’, @, --- , 6°. In this note we shall treat the parametric case. For 
any parameter point 6 = (6, ---, 6°), let F(z, @) denote the corresponding 
cumulative distribution function of X; ; i.e., F(z, 0) = prob. {X; < x}. The 
totality © of all possible parameter points is called the parameter space. Thus, 
the parameter space Q is a subset of the k-dimensional Cartesian space. 

It is assumed in this note that for any 6, the cumulative distribution function 
F(x, 0) admits an elementary probability law f(z, 6). If F(x, 6) is absolutely 
continuous, f(x, 6) denotes the density at x. If F(x, 6) is discrete, f(x, 6) is equal 
to the probability that XY; = z. 

Throughout this note the following assumptions will be made. 

ASSUMPTION 1. F(x, 6) is either discrete for all @ or is absolutely continuous 
for all 0. 

Before formulating the next assumption, we shall introduce the following 
notations: for any @ and for any positive value p let f(x, 6, p) be the supremum of 
f(x, 6’) with respect to 6’ when | @ — 6’| S p. For any positive r, let o(z, r) 
be the supremum of f(x, 6) with respect to @ when | @| > r. Furthermore, let 
f*(x, 0, p) = f(x, 0, p) when f(z, 6, p) > 1, and =1 otherwise. Similarly, let 
g*(x, r) = v(x, r) when g(x, r) > 1, and =1 otherwise. 

ASSUMPTION 2. For sufficiently small p and for sufficiently larger r the expected 


values | log f*(x, 0, p) dF (x, %) and log o*(a, r) dF (x, 0) are finite where 


6 denotes the true parameter point.’ 
AssumPTIoN 3. /f lim 6; = @, then lim f(x, 6:) = f(x, 0) for all x except perhaps 


i=00 





on a set which may depend on the limit point 6 (but not on the sequence 6;) and 
whose probability measure is zero according to the probability distribution corre- 
sponding to the true parameter point % . 

AssuMPTION 4. If 6; is a parameter point different from the true parameter point 
0) , then F(x, 0:) # F(x, 0) for at least one value of x. 

Assumption 5. If lim | 6;| = ©, then lim f(x, 0;) = 0 for any x except perhaps 
on a fixed set (independent of the sequence 0;) whose probability is zero according 
to the true parameter point 6 . 

AssumPpTION 6. For the true parameter point 0) we have 


[ | log f(z, oy) | dF (x, 0) < @. 


ASSUMPTION 7. T'he parameter space Q is a closed subset of the k-dimensional 
Cartesian space. 

AssumPTION 8. f(x, 0, p) is a measurable function of x for any 6 and p. 

It is of interest to note that if we forbid the dependence of the exceptional set 
on 6 in Assumption 3, Assumption 8 is a consequence of Assumption 3, as can 
easily be verified. 


4The measurability of the functions f*(z, 0, p) and ¢*(z, r) for any 8, p and r follows 
easily from Assumption 8. 








MAXIMUM LIKELIHOOD ESTIMATE 597 


In the discrete case, Assumption 8 is unnecessary. In fact, we may replace 
f(x, 0, p) everywhere by f(x, 6, p) where f(x, 6, p) = f(x, 0, p) when f(x, %) > 0, 
and f(x, 0, p) = 1 when f(x, 0) = 0. Here 4 denotes the true parameter point. 
Since f(x, %) > 0 only for countably many values of x, f(x, 8, p) is obviously a 
measurable function of x. 

In the absolutely continuous case, F(x, 6) does not determine f(x, @) uniquely. 
If Assumptions 3, 5 and 8 hold for one choice of f(x, 6), they do not necessarily 
hold for another choice of f(z, 9). This is in a way undesirable, but assumptions 
of such nature are unavoidable if we want to insure the consistency of the 
maximum likelihood estimate. It is, however, possible to formulate assumptions 
which remain valid for all possible choices of f(x, 6) and which insure the con- 
sistency of the maximum likelihood estimate for a particular choice of f(a, 6). 
In this connection the following remark due to Doob is of interest. Let Assump- 
tions 3’ and 5’ be the same as 3 and 5, respectively, except that the exceptional 
set is permitted to depend on the sequence 6; . If 3’ and 5’ hold for one choice of 
f(x, 0), they also hold for any other choice. Doob has shown that Assumptions 3’ 
and 5’ insure the existence of a choice of f(a, 6) for which Assumptions 3, 5 and 8 
hold. Thus, one may say that Assumptions 3’ and 5’ are the essential ones and 
the stronger assumptions 3, 5 and 8 are needed merely to exclude a ‘bad’ 
choice of f(x, 6). 


2. Some lemmas. In this section we shall prove some lemmas which will be 
used in the next section to obtain the main theorems. Let 6 be the true parameter 
point. By the expected value Eu of any chance variable u we shall mean the 
expected value determined under the assumption that 4 is the true parameter 
point. For any chance variable u, wu’ will denote the chance variable which is 
equal to uw when u > 0 and equal to zero otherwise. Similarly, for any chance 
variable u, the symbol wu’ will be used to denote the chance variable which is 
equal to wu when u < 0 and equal to zero otherwise. We shall say that the expected 
value of wu exists if Eu’ < «. If the expected value of w’ is finite but that of w’’ 
is not, we shall say that the expected value of u is equal to — «. 

Lemma |. For any 0 # 05 we have 


(1) E log f(X, 0) < E log f(X, 6) 


where X is a chance variable with the distribution F(x, 4%). 
Proor. It follows from Assumption 2 that the expected values in (1) exist. 
Because of Assumption 6, we have 


(2) E | log f(X, %)| < =. 


If E log f(X, 6) = — «, Lemma 1 obviously holds. Thus, we shall merely consider 
the case when E log f(X, 6) > — «. Then 


(3) E | log f(X, 0) | < &. 
Let wu = log f(X, 6) — log f(X, 6). Clearly, Z| u| < ». Iti known that for 








598 ABRAHAM WALD 


any chance variable u which is not equal to a constant (with probability one) 
and for which E | u| < ©, we have® 


(4) Eu < log Ee”. 
Since in our case 
(5) Ee’ < 1, 


and since u differs from zero on a set of positive probability (due to Assumption 
4), we obtain from (4) 


(6) Eu < 0. 


Thus, Lemma 1 is proved. 
We shall now prove the following lemma. 
Lemma 2. lim E log f(X, 0, p) = E log f(X, 6). 
p=0 


Proor. Let f*(x, 0, p) = f(x, 6, p) when f(z, 6, p) = 1, and =1 otherwise. 
Similarly, let f*(a, 0) = f(x, 0) when f(x, 6) = 1, and =1 otherwise. It follows 
from Assumption 3 that 
(7) lim log f*(z, 0, p) = log f *(2, 6) 

p=0 
except perhaps on a set whose probability measure is zero. Since log f*(x, 0, p) 
is an increasing function of p, it follows from (7) and Assumption 2 that 
(8) lim £ log f*(X, 0, p) = E log f*(X, 8). 
p=0 
Let f**(x, 0, p) = f(x, 8, p) when f(x, 0, p) < 1, and =1 otherwise. Similarly, let 
f**(x, 0) = f(x, 6) when f(x, 0) S 1, and =1 otherwise. Clearly, 


(9) | log f**(x, 0, p) | S | log f**(z, 8) | 
and 
(10) lim log f**(x, 8, p) = log f**(x, 6) 
ns 
for all x except perhaps on a set whose probability measure is zero. The relation 
(11) lim E log f**(X, 6, p) = E log f**(X, 6) 
pe 

follows from (9) and (10) in both cases, when E log f**(X, @) is finite and when 
E log f**(X, 6) = —«. Lemma 2 is an immediate consequence of (8) and (11). 

Lemma 3. The equation 
(12) lim E log g(X, r) = —. 
holds. 





5 It, is of no consequence what value is assigned to u when f(z, @) or f(x, %) is zero, since 
the probability of such an event, because of (3), is zero. 

6‘ This is a generalization of the inequality between geometric and arithmetic means. 
See, for example, Harpy, LrrrLEwoop, Potya, Inequalities, Cambridge 1934, p. 137, The- 
orem 184. 


MAXIMUM LIKELIHOOD ESTIMATE 599 


ProorF. It follows from Assumption 5 that 


(13) lim log g(x, r) = —&, 
for any x (except perhaps on a set of probability 0). Since according to Assump- 
tion 2, 


(14) E log ¢*(X, r) < &, 


and since log g(x, r) — log ¢*(z, r) and log ¢*(z, r) are decreasing functions of 
r, Lemma 3 follows easily from (13). 


3. The main theorems. We shall now prove the following theorems. 
THEOREM 1. Let w be any closed subset of the parameter space Q which does not 
contain the true parameter point 6 . Then 


(,. Sup f(Xi, Of(X2,6) «++ f(Xa, 8) ) 
(15) prob. , lim tes aes J = 0} = 1. 


n=o 


\ f(X% ’ Bo)/(Xe ? Oo) «°° (Xn ’ 60) } 








Proor. Let ro be a positive number chosen such that 
(16) E log o(X, ro) x E log F(X, >). 


The existence of such a positive number follows from Lemma 3. Let w; be the 
subset of w consisting of all points 6 of w for which | @| < 7). With each point @ 
in w, We associate a positive value ps such that 


(17) E log f(X, 9, ps) < E log f(X, &). 


The existence of such a pg follows from Lemmas 1 and 2. Since the set a; is 
compact, there exists a finite number of points 6, ---, 6, in w: such that 
S(0:, pe,) + -+- + S(O, po,) contains w; as a subset. Here S(6, p) denotes the 
sphere with center @ and radius p. Clearly, 


A 
0 < Sup f(x, 6) --- f(a, , 6) S > f(x 59: pes) +** Flan 6: , pp;) 
jak 


bew 


+ g(a ’ ro) ner g(tn ; ro). 


Hence, Theorem 1 is proved if we can show that 


( . t(X ’ 6; : pe;) ee S(Xn . 6; ’ pe;) 7} . 
= ae ‘a (kX, Oo) °° : W(X, ’ Bo) - . als ( — 7 . 





and 


(19) prob lim g(X1, To) eee g(Xn, ro) 


) 
th, eee wa > an b. 
n=O f(X1 ? 90) een (Xn > Oo) ° 








600 ABRAHAM WALD 





The above equations can be written as 


_— , 
(20) prob — Z. [log f(Xa, 6:, pe;) — log f(Xa, %)] = — 2p — 


n=O a=] 






and 


n \ 
(21) prob tim > [log of Xa, m( — log f(Xe, %)] = — 20 > as J 
n=O a=! / 
These equations follow immediately from (16), (17) and the strong law of large 
numbers. This completes the proof of Theorem 1. 
THEOREM 2. Let 6,(%1, --+ , Zn) be a function of the observations x1, --- , Tn 
such that 


f(a, Bn) ees f(an, On) = 


22) Regememtapgennmceelnmemncnane> ae 
f (a1, %) +++ fn, 9) 


c > 0 for all n and for all x1, +++ 5 tn. 


Then 


(23) prob {lim 6, = 6} = 1. 


n=O 





ProorF. It is sufficient to prove that for any e > 0 the probability is one that all 
limit points @ of the sequence {6,} satisfy the inequality |@ — ®|< «. The 
event that there exists a limit point 6 of the sequence {@,} such that |@— 6 | > « 
implies that Sup f(a, 6) --- f(t, , 0) = f(xai, 9) «++ flan, 9.) for infinitely 


|@—@9|2¢€ 
many n. But then 





Sup f(a, 6) +--+ f(x,, 6) 
(24) eS 

f(x1, 8) +++ f(r, Ao) 
for infinitely many n. Since, according to Theorem 1, this is an event with 
probability zero, we have shown that the probability is one that all limit points 
6 of {6,} satisfy the inequality |@ — 6 |< «. This completes the proof of 
Theorem 2. 

Since a maximum likelihood estimate 6,(11, --+ , 2n), if it exists, obviously 

satisfies (22) with c = 1, Theorem 2 establishes the consistency of 6,(x1 , «++ , Xn) 
as an estimate of @. 





4. Remarks on possible generalizations. The method given in this note can be 
extended to establish the consistency of the maximum likelihood estimates for 
certain types of dependent chance variables for which the strong law of large 
numbers remains valid. 

The assumption that the parameter space Q is a subset of a finite dimensional 
Cartesian space is unnecessarily restrictive. Let 2 be any abstract space. All of 































ON WALD’S PROOF OF CONSISTENCY 601 


our results can easily be shown to remain valid if Assumptions 3, 5 and 7 are 
replaced by the following one: 


ASSUMPTION 9. It is possible to introduce a distance 6(6,, 62) in the space Q such 
that the following four conditions hold: 

(i) The distance 6(@: , 62) makes Q to a metric space 

(ii) lim f(x, 0:) = f(x, 0) if lim 6; = 0 for any x except perhaps on a set which 


may depend on 6 (but not on the sequence 0;) and whose probability measure is zero 
according to the probability distribution corresponding to the true parameter point 4% . 
(iii) If 6 is a fixed point in Q and lim 6(6;, 0%) = «, then lim f(x, 6;) = 0 


7 ==00 i ) 


for any x. 
(iv) Any closed and bounded subset of Q is compact. 


REFERENCES 


{1] J. L. Doos, ‘Probability and statisties,’”’ Trans. Amer. Math. Soc., Vol. 36 (1934). 
[2] H. CramEr, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 


(3] V. S. Huzursazar, ‘‘The likelihood equation, consistency and the maxima of the likeli- 
hood function,’”’ Annals of Eugenics, Vol. 14 (1948). 


(a RR 


ON WALD’S PROOF OF THE CONSISTENCY OF THE MAXIMUM 
LIKELIHOOD ESTIMATE 


By J. WoLrowI!rTz 
Columbia University 


This note is written by way of comment on the pretty and ingenious proof of 
the consistency of the maximum likelihood estimate which is due to Wald and is 
printed in the present issue of the Annals. The notation of this paper of Wald’s 
will henceforth be assumed unless the contrary is specified. 

The consistency of the maximum likelihood estimate is a ‘“‘weak” rather than 
a “strong” property, in the technical meaning which these words have in the 
theory of probability, i.e., it is a property of distribution functions rather than of 
infinite sequences of observations. Prof. Wald actually proves strong convergence, 
which is more than consistency. His proof uses the strong law of large numbers, 
and he remarks that his method ‘‘can be extended to establish consistency of the 
maximum likelihood estimates for certain types of dependent chance variables 
for which the strong law of large numbers remains valid.” Below we shall use 
Wald’s lemmas to give a proof of consistency which employs only the weak law 
of large numbers. Not only does this proof have the advantage of being expedi- 
tious, but it can be extended to a larger class of dependent chance variables. 

The consistency of the maximum likelihood estimate follows from the following 

THEOREM. Let n and e¢ be given, arbitrarily small, positive numbers. Let S( , 7) 
be the open sphere with center 0) and radius n, and let Q(n) = 2 — S(H%, n). Let 








602 J. WOLFOWITZ 


Wald’s Assumptions 1-8 hold. There exists a number h(n),0 < h < 1, and another 
positive number N (n, €) such that, for any n > N(n, €), 


n ) 
sup Li KX,, 6) | 
P, | CI ew ee 


IT Ax., 90) 


where Po is the probability of the relation in braces according to f(x, 6). 
Proor: Proceed exactly as in the proof of Wald’s Theorem 1 and obtain 7% , 





pe1, *** , po, , SO that the set theoretic sum of the open spheres S(6; , ps;), 7 = 1, 
2,---, h, covers the compact set which is the intersection of Q(n) with the 


sphere | 6| < 7. Define 7(6;),7 = 1,---,h + 1, as follows: 
—2T(6;) =E log f (X, 6; ’ po; ) ~ § log F(X, et) 
oe h) 


I 
ae 


(2 
— 27 (6441) = FE log o(X, ro) —E log f(X, 0). 

If any of the right members above are infinite let 7'(@;) be one, say. Thus all 

T(6;) are positive. Applying the weak law of large numbers we have that, for any 

i such that 1 <7 < h+-1, there exists a positive number N; such that, when n > 

N;, 


\ 
| 
| 


| Il s(x;, 6:, pe; ) 
1 


Py < > exp (—aT(o) - ri 

| sx, 00) ) 
(¢ = 1,--- ,h) 

( n _ 
pitcen ea 

Po) [———— > exp (—nT(u1)) + > p4- 
LTT s%, &) | 
\ 1 


From this the theorem follows immediately, with 


N(g, «) = max N; 
h(n) = max exp {—T7(6;)}. 


The author is obliged to Prof. Wald for his kindness in making his paper 
available to the author. 


RANDOM WALK 603 


A NOTE ON RANDOM WALK 
By Hersert T. Davin 
The Johns Hopkins University Institute for Cooperative Research 


A random walk is defined as a series of discrete steps along the real line, here 
denoted by J. Each step is represented by the chance variable X, with sectionally 
continuous density function f(z). The walk begins at any point a of J, and 
continues until a step carries us outside some subregion Q of J. In this note, © is 
taken as a finite interval with upper bound D and lower bound D — y. The 
chance variables N and Z are, respectively, the number of steps required to end 
the walk, and the endpoint of the walk. The range of Z always excludes Q. 

Below, we define x = D — a, and consider E(N) as a function G(z, y) of x 
and y. Under specified conditions, a differential equation (32) is derived, relating 
G(0, y) and G(a, y). 


Let 
(1) vill) = f(t — a) 
n—l 
(0 = fo @=0-+ [| Thy 
(2) n—l 
§(t-@— Lai) dor + dors a> 
7=1 
where 
E ~ > 9: | for2:1,2,---,n— 1. 
7=1 
Then 
P{iZew,,N = n} = / v(t) dt for wieQ 
P\Zew.,,N =n} =0 for wreQ 
Hence 
PIN = nj = [ya 
(3) ' 


EW=)i Df yn ae 
2 


i=1 


The transformation [h; = a + ja g;; 27:1, -:+, nm — 1) gives for y,(t) 
the more convenient expression 


wnt) = fe (w= [fu -a) 


n—1 


- [] fh: — Avadf(t — Ana) dhi +++ dha. 


(4) 








604 HERBERT T. DAVID 


The n-fold integral / ¥,(t) dt is absolutely convergent, hence may be inte- 
I 


grated first with respect to ¢. This gives, keeping the notation of (4) 


(5) [ v.co dt = [ Yn) dh,-1. 
I 2 


Assuming that E(N) remains finite for all considered a and Q, series (3) may be 


wm 
rearranged, giving : E(N) = Zz. B; where 


i=l] 


B. => [vo a, 
Q 


j= 


oo 
Now, B, = > P(N = 7} = 1. Also, using (5) and induction on 2, it is readily 


i=l 


shown that B, = [vw dt, so that 


(6) EN) =1+>, [voae 
i=l] Jf 

Define transformations T,:[g; = D — hi, i:1,---,n —1; g. = D — ¢). 
Substituting expressions (1) and (4) in (6), transform the jth term of the sum- 
mation by 7’; . This gives 

2 y y n—1 

(7) EIN) =1+ > ++ (n) vee f(x — g) ITs: — gis) dg. +++ dgn 
where x = D — a. 

By (7), E(N) is a function of x and y; hence we write E(N) = G(a, y). 

Define: 

M(k) : Max f(t) for ,t| < hk. 

Kk : Any number satisfying AK < [l — e€]/M(K). 

R : Any region[—-x« <a< 7;0<y< K). 

M : Max f(t). 

i : Any number satisfying L < [1 — ¢]/M. 

R’: Any region[—~x“ <x< ~;0<y< LI. 

In the ensuing argument, we shall assume that 


(8) (x, y) eR. 


This condition restricts certain one-dimensional and two-dimensional variables 
to regions over which some infinite series are uniformly convergent with respect 
to these variables. Uniform convergence is required to validate term-by-term 
differentiations and integrations, and to establish the continuity in one or two 
variables of certain functions represented by series. 

Arguments dealing with the solution of integral equations (17), (20) and (25) 
are valid only under the more restrictive condition 


8 


|= 


RANDOM WALK 605 


(9) (x, y) € R’ 


this being the general sufficiency condition for the existence of solutions. How- 
ever, (17) and (20) enter the argument with respect only to the derivation of 
equation (21) which could have been derived, though in a more cumbersome 
manner, by a term by term comparison of the series expressions for [Ao (x, y)] 
[G(y, y)| and for [Ga(x, y)] [A(y, y)], this latter approach being valid under (8). 
Similarly, (25) is used only in obtaining (27), which could have been obtained 
by a direct manipulation of the series expression for G(x, y), this approach also 
being valid under (8). Hence, all subsequent derivations hold, as long as (x, y) ¢R 

By (8), we may interchange summation and integration with respect to 
in (7). This gives 


(10) G(x, y) = 1+ | fe — GCG, y) ay. 


(11) Assume that f(t) has a continuous derivative everywhere 
Then f(t) is continuous and G(a, y) is continuous by (7) and (8). Hence 


(12) f(x — g)G(g,y) and d/dx f(x — g)G(g, y) are continuous in (2, g) 


(13) f(x — g)G(g, y) is continuous in (g, y). 
Let G,;(x, y) denote 
d' d .. 
dai dy G(x, y). 


Then, by (12), we may differentiate (10) with respect to z, and, since 
fio(x — g) = —fo(x — g), an integration by parts yields 


(14) Gola, y) = f(x)G(O, y) — f(x — y)Gy, y) + | f(x — g)Gil(g, y) dg. 
0 
Further, under (8), Go(2, y) may be obtained by differentiating (7) term by 


term, and is continuous in (2, y). Hence, f(a — g)Gmn(g, y) is continuous in (g, y), 
and we may differentiate (10) with respect to y, giving 


(15) Gola, y) = f(x — y)Gly, y) + | f(x — g)Gulg, y) dg. 
Jo 


Adding (14) to (15), dividing by G(0, y) which is always greater or equal to 1, 
and letting 


(16) A(x, y) = [Gilz, y) + Galz, y)]/GO, y) 


we obtain 


y 
(17) May y) = fa) + | fle — OG y) a. 
0 








606 HERBERT T. DAVID 


Under (9), (17) defines a function 


May) = se) + DL [fe = 
(18) _ 


n—1 


IT sg: — gist) f(gn) dgi +++ dgn. 


By (8), this function is continuous in (x, y) and may be differentiated term by 
term with respect to y. Further, Ao: (xz, y) thus gotten is continuous in (z, y), so 
that f(z — g)An(g, y) is continuous in (g, y). Hence, (17) may be differentiated 
with respect to y, giving 





vy 
(19) Ao(z, y) = f(x — y)My, y) + I f(x — g)dn(g, y) dg. 
Since, under (9), the integral equation 
y 
(20) a(x, y) = flv = y) + | fle — gala, y) dg 
has a unique continuous solution for every fixed y, (15) and (19) give 
(21) A(x, y) _ Gola, y) 
A(y,y) Gy, y) 
Hence 
y 7] 
I Aula, y) dx i Gola, y) dx 
0 a “0 7 ae , 
Myy)  — ~Gy, y) 
and 
y d y - 
~- A(x, y) de G(x, y) dx 
(22) dy Jo cin. it 
Ay; y) Gy, y) 
(23) Let f() = f(—0. 
Then it is obvious from the definition that 
(24) G(0, y) = Gy, y). 


Further, by (15), 


Gola, y) _ Golg, y) 
G(y, y) Gy, y) 


so that, under (9), (25) gives for Gu(x, y)/G(y, y) the unique expression 


y 
(25) fiz ~ 9) + | fix — g) dg 
0 
es) y ny n—l 
fae@-y+h vee (n) ves I Sz — 91) TL Ss — gisdflgn — y) dyn +++ dgn 
n=l = 


which, by (23), is equal to 


Jn 


RANDOM WALK 607 


2d y y n—1 
fly — 2) + Df [sy = 90) Th floss: — afar = 2) dos + dg. 
Since, under (8), we may interchange summation and integration with respect 
to x, it follows that 


” Go(z, y) = [ , oo y 
i a - 0 Fa ~ 5G + / an (n + 1) --- 


n=] 





0 
(26 


y n—l 
. | fly — gn) LT Sis — gi f(gi — x) dg --+ dg, dx 


which, by a change of integration indices and a referral to (7), is seen to equal 
[G(y, y) — 1]. (26) thus gives 


y 
(27) [ Gale, y) de = Gly, Gy, y) — 1. 
Further, by (16), (24), and (27), 
y 
(28) | Ax, y) dx = G(0,y) — 1 
0 
so that 
(29) : | Ma, y) dr = - G(0, y) 
dy Jo dy 
while (24) and (27) also yield 
d 


(30) | G(x, y) dz = [G(0, y). 
0 


dy 
Hence, by (22), (29), and (30), 
C wu ' 
(31) My, y) = , GO, y)/GO, y). 


Finally, substituting (31) in (21), and remembering the definition of \ given in 
(16), we get, using (24), 


l : 
(32) G(0O, y)[Gula, y) + Gil2, y)] = “ G(0, y)[Gio(2, y) + 2Gor(x, y)]. 
( 
The conditions under which (32) holds are, in summary, (8), (11), and (23). 


If f(t) has an expansion 


2 


(33) fo => At; ltl<T7 
i=0 

it is clear from (7) that 

(34) Gx, y) = Qo Biz'y’ 


i,j=0 


for (x, y) « S, where S: (7) <r < T1;0 Sy < T14+ Ti); T < 0,7, < T. 








608 ROBERT E. GREENWOOD 


Substituting (34) in (32), and equating coefficients of like powers of (x, y), 
we obtain the recursion formulae 
(35) Do Bi Bol M2k—-j+U= Yo Biss Buli + WG — Ws i20,1,-+. 
Itk=n jt+tk=n-—1 
From (10), it is readily verified that Bi = 0 for 7 ¥ 0, so that equations (35) 
give solutions for the B;; in terms of the By, . These solutions are of interest 


since they show a one-to-one correspondence between the functions G(0, y) 
and G(x, y), for (x, y) e{RN Sl]. 


(ee a a 


NUMERICAL INTEGRATION FOR LINEAR SUMS OF EXPONENTIAL 
FUNCTIONS 


By Rospert E. GREENWOOD 
The University of Texas and the Institute for Numerical Analysis' 


1. Introduction. The methods of numerical integration going by the names 
trapezoidal rule, Simpson’s rule, Weddle’s rule, and the Newton-Cotes formulae 
are of the type 


n 


el 
oa | f(x) dx ~ D1 Ninf (xin) 
4 


1=0 
where the abscissae {.,,} are uniformly distributed on a finite interval, chosen 
as (—1, 1) for convenience, 


(2) Lin = ro f-~O1,2 +++, 
and where the set of constants {A;,} depend on the name of the rule and the value 
of n but not on the function f(x). Throughout this note all abscissae will be 
assumed to be uniformly distributed on (—1, 1) unless the contrary is explicitly 
stated. 

Since correspondence relation (1) involves (n + 1) constants {A;,}, it might 
be possible to choose (n + 1) arbitrary functions g;(r), 7 = 0, 1, 2, --:, n, 
and require that the set {din} be the solution, if such exists, of the (m + 1) 
simultaneous linear equations _ 


1 n 
(3) I g(x) dx — a Nin Gil Xin), J = 0, l, 2, oe ? n. 
L-} 


=U 
Indeed, the selection 


(4) g(x) = x’, j = 0,1,2,---,n 


> 
will give a set of (1 + 1) simultaneous equations of form (3) and the solution {Ain} 
is the set of Newton-Cotes weights for that value of n. The numerical evaluation 


1 This work was performed with the financial support of the Office of Naval Research of 
the Navy Department. 


es 


ae 


ue 


\ 
in} 


on 


of 


NUMERICAL INTEGRATION 609 


of {Ain} is best accomplished by other and more sophisticated methods, how- 
ever.” 

Because of linearity in both the integral and the finite summation, once the 
constants {A;,} have been determined for a specific set of functions {9,(x)}, 
correspondence relation (1) is exact for any linear combination of that funda- 
mental set. Thus, for example, for the fundamental set (4), correspondence 
relation (1) with the appropriate values {A;,} is exact for all polynomials of 
degree less than or equal to n. 

Although tradition favors the set of functions (4), there is nothing compelling 
about such a selection. Indeed, two other possible choices might be 


(5) g(x) = e”, j = 0,1,2, ---,2n, 
and 
(6) g(x) = e*, 

j= —m, —m+1,---,0,1,---,m—1,m;n = 2m. 


These choices would seem to be appropriate whenever numerical methods are 
being applied to exponential growth curves or exponential decay curves. 


2. Use of the basic set g;(x) = e’. It integration relation (1) be made exact 


for the set {e”}, 7 = 0,1, --- , n with evenly spaced x abscissae, the set (3) of 
(n + 1) simultaneous linear equations in the unknowns {),;,!, 7 = 0, 1, --- ,n 
is obtained. Call the solution of this system {a;,}, solution values for n = 1,2, 


3, 4, 5, 6 are tabulated below. 

For the symmetric case where integration relation (1) is made exact for 
fel = —m, —m+1,---,m— 1,m;n = 2m, a similar but different set of 
linear equations (3) results for the unknowns {),;,}. Call the solution of this 
system {b;,}. As implied above, only even values of x are used in order to preserve 
the symmetry, and values of {b;.} are tabulated below for n = 2, 4, 6. 


n=i1., a = 1.31303 5285 

n= 2, a = 0.21805 032" be. = 0.32260 6237 
Qe = 1.49780 742 by = 1.35478 755 
Q2 = 0.28414 226° be = 0.32260 6237 

n = 3, Q@3 = 0.51324 284 
ao, = 1.08155 527 
a33 = 0.18075 134 

n = 4, do, = —0.13716 6397 bos = 0.15048 171 
a4= 1.40098 548 bis = 0.73243 318 


2 Whittaker and Robinson, The Calculus of Observations, 4th Edition, (1946), London, 
pp. 152-156. 








610 ROBERT E. GREENWOOD 


a4 = —0.30895 914 ba = 0.23417 022 
ax = 0.91710 903 bs = 0.73243 318 
Q4 = 0.12803 103° by = 0.15048 171 
n = 5, a= 0.68919 3 
ay = — 1.07644 3 
Q@; = 2.12534 6 
a3, = — 0.63595 6 
Qs = 0.79933 
a5 = 0.09852 18 
n = 6, Qe = —0.83607 bes = 0.09443 5 
Qe = 3.54128 bis = 0.53464 7 
a4 = — 3.88102 bog = 0.01139 3 
Q3 = 3.32254 bss = 0.71905 0 
dy, = —0.94685 bas = 0.01139 3 
as6 = 0.72075 bss = 0.53464 7 
ae = 0.07937 57 bes = 0.09443 5 


The computing service of the Institute for Numerical Analysis has supplied the author 
with most of the coefficients tabulated above. 


3. Estimates of the error term. The choices of the coefficients {a;,} and 
{b;,} are such that integration relation (1) is exact whenever 
(7) f(a) = Ao + Are? + --- + Ane™ and Am = ain, 
and whenever 
(8) f(z) = Bimne” + Bimye OF +--+ Bot --- + Bne™ and Ain = Din. 
When f(x) is not of these prescribed forms, the error in using correspondence (1) 
may be of some importance. By making the transformation 
(9) u=e, f(x) = f(logu) = g(u) 
integration relation (1) becomes 
(10) [., fa ~ dX Aing(Uin) 


u 


where the {u;,} are not evenly distributed. By approximating g(u) by its Taylor’s 
series with a remainder term, the following expressions for the error in using 
correspondence (1) can be obtained: 

Using the coefficients {a;,}, 


(< _ \" 
eee n n+l 
(11) “rror < - tl E a Zz | din | | max (= #) ste) | 


(n+ 1)! i=0 -1<z<1 
and, using the coefficients {Din}, 


nd 


lin « 


(1) 


or’s 
sing 


NUMERICAL INTEGRATION 611 


(12) = "Qm + 1)! 


ti ,2m 
m ° > id 
i=0 


7 d 2m+1 
Es (« , fa) |. 


Neither of these error expressions can be said to be very practical in actual 


computation, and neither appears suitable for establishing convergence proper- 
ties of the type 


e™ — em 2m | b; iis 
Error < = a + me] 


n 1 

no j= Lat 
However, both (11) and (12) reduce to zero when f(x) is of the form prescribed 
by (7) or (8) respectively. 


4. Numerical examples. As illustrative numerical examples, the case n = 4 
was selected and several typical functions were integrated approximately by the 


positive power exponential rule, the symmetrical exponential rule and the 
Newton-Cotes formula, 


[ He) de = asl0f(—1) + 32f(—¥) + 1290) + 32/4) + TF). 


Values of {ais} and {bi} are given in the tables in part 2. The typical functions 
used were x’, e”, 1/(x + 3), &*, xe”, z, and e*”. The following results were 
obtained: 


8 Decimal 











| “if 

| | iii ‘ | 

| 2p ‘ | Positive Power Symmetrical oe , cua 

| Function Expontential Exponential Newton-Cotes rom to 
| a 5703 8827 | .6671 8001 | .6666 6666 | .6666 6667 | 
ra 3.6268 6044 | 3.6268 6041 | 3.6317 3108 | 3.6268 6041 | 


1/(x + 3) | .6828 6353 | .6931 5792 | .6931 7460  .6931 4718 | 
— | 1.4930 1396 | 1.4857 2754 | 1.4887 4582 1.4936 4827 | 














re" .7292 4338 | .7353 6007 | .7361 7480 | .7357 5888 
iz 0270 8487 | .3238 5196 | .3333 3332 | .2857 1429 | 
i 4.0527 7287 | 4.0530 7585 | 4.0607 7415 


— 4.0519 1379 





From this tabulation, it would appear that the symmetrical exponential 
method compares favorably with the Newton-Cotes method for such typical 
functions as 1/(x + 3), e~7, xe, x®, and e?"*. Note that the choice of z? or e* 
is not really a fair choice when comparing these two methods, since Newton- 
Cotes is derived so as to give exactness for x? and the symmetrical exponen- 
tial so as to give exactness for e°*. 










ARTHUR SARD 






SMOOTHEST APPROXIMATION FORMULAS 






By ARTHUR Sarp! 


Queens College 






Introduction. Consider a process of approximation which operates on a 
function x = x(t). The error in the process may be thought of as a sum RF + 6A, 
where RF is the error that would be present if x were exact and 6A is the error due 
to errors in x. (Precise definitions are given below.) Suppose that one wishes to 
choose one process A from a class @ of processes. In some situations it is appro- 
priate to base the choice on R alone’; in others it is appropriate to consider 6A. 

The primary purpose of the present note is to formulate a criterion of smoothest 
approximation: That A in @ is smoothest which minimizes the variance of 
5A. A criterion based on both RF and 6A is also suggested. (Sections 1 and 2.) 
Smoothest approximate integration formulas of one type are derived in Section 3. 

Progress in the technique of estimating the covariance function of the errors 
in x will lead to further applications of the criterion of smoothest approximation. 



















1. Approximation of a functional. Suppose that X is a space of functions 
x = x(t) each of which is continuous on a S ¢ S b. Let f[x] be a functional 
defined on X; that is, f[z] is a real number defined for each x « X. For example, 
X might be the space of functions with second derivatives on [a, b] and f[x] might 
be x’’(u), where wu is a fixed number in [a, B]. 

Suppose that f[z] is to be approximated by a Stieltjes integral 








b 
(1) A - | x(t) da(t), aeX, 





where a is a function of bounded variation. The remainder in the approximation 
of f[x] by A is 







R=A —ff{z]. 
If the approximation (1) operates on x + 6z instead of z, the result is A + 6A = 
b 






(x + 6x) da; and the error in the approximation of f[z] by A + 6A is R + 6A, 


t 
where 






" b 
(2) iw | 6x(t) da(!). 





Consider a class @ of approximations A, each of the form (1). We shall propose 
a criterion for characterizing the ‘smoothest A” in Q, relative to the covariance 
function of the errors 6x. 






1 The author gratefully acknowledges financial support received from the Office of Naval 


Research. 
2 “Best approximate integration formulas; best approximation formulas,’’ Amer. Jour. 


of Math., Vol. 71 (1949), pp. 80-91. 









\v 


APPROXIMATION FORMULAS 613 


Assume that 6x = 62(t) is a stochastic process with mean zero’ and covariance 
function o(t, uw) = E[éx(t)éx(u)]. Then, by (2), 6A is a stochastic variable; and‘ 


b b 


EsA = E | éx da = Oda = 0, 


t=a t=a 


ae Bl / ‘éx(t) dat) | ‘bx(u) daw) | ‘ | . / “ o(t,u) de(t) dou). 


CriTERION. That A (if any) in@ is smoothest which minimizes the variance 
v of 6A. 

In particular cases, this criterion (least squares) has been proposed and used 
by Chebyshev and others. An application to approximate integration is given in 
section 3 below. 

One may extend this discussion to cases in which the approximations A 
involve derivatives of x. 

Remark. The criterion of best approximation” may be combined with the 
above criterion of smoothest approximation as follows: That A (if any) in @ is 
the best compromise which minimizes a specified combination of the variance 
of 6A and the modulus of R. Here it is assumed that the remainders R satisfy the 
conditions for the existence of the modulus.” 


2. Approximation of a function. One may extend the preceding discussion 
to the case in which y = f[x] is an operation to a space of functions y = y(u)- 
@ <= u S 6b; and in which the approximation of f[z] is 

t 


A= [ x(t) dr alt, u), 2¢ A, 
Ja 
where, for each u, @ is a function of bounded variation in ¢. Then, for each uw, 
6A has a variance v(u). Criterion. That A (if any) in a class of approximations is 
smoothest which minimizes v(u) for all u; failing such an A, that A (if any) is 
smoothest which minimizes the integral of v(w), or alternatively, the supremum 
of v(u), overd Su S b. 


3. Smoothest approximate integration formulas in a particular case.’ Let m 
and n be fixed integers; m = 1,n 2 0. Let @ = Q,,,,, be the class of all approxima- 
tions of 

3 The essential point here is that H8(f) = m(t) be known for each ¢; for given m(¢), one 
could and would replace x + é6r by x + 6r — m. 

‘ We assume here that the integrals in (3) exist and that the inversions of E and fda 
are valid. For this it is sufficient that 6c be integrable relative to the product measure aw 
for all functions @ corresponding to elements of (?, Where w is the measure in the underlying 
probability space relative to which E is the operator [ dw. Cf. J. L. Doob, “Probability in 
function space,”’ Bull. Amer. Math. Soc., Vol. 53 (1947), especially pp. 26, 27. 

> The approximate integration formulas of this section are of such a nature that one 
would expect them to be known. The values of J at the end are probably new. 








614 ARTHUR SARD 


[ a x(t) dt = fie] 


—m 


of the form 
m/2 


A= ) b2(t), 
i=—m/2 
the m + 1 constants b; being such that A = f[x] whenever x(t) is a polynomial of 
degree n. Throughout this section 7 is to range over the m + 1 valuesi = —m/2, 
—m/2 + 1, --+, + m/2. Suppose that the errors 6x(i) are independent, with 
common variance o°, and with mean zero. Then a(t) is a step function with jumps 


b; at t = 7; and 
y= oD) bi. 


The smoothest approximation in @,,, is the one for which v is a minimum. 
(The m + 1 variables b; in v are subject to n + 1 constraints due to the condition 
that the approximation be exact for degree n. The set Qn, is empty if and only 
if m is less than the largest even integer contained in n.) 

If n = 0 or 1, the smoothest formula in Qn, is the one for which all the 
coefficients are equal: 


b; = m/(m + 1); 
in which case 
v = mo /(m + 1). 


If nm = 2 or 3, the smoothest formula in Q.,,,, is characterized by the following 
relations: 


b; = No + tm ’ 
Ao = m(2m> + 9m — 6)/2(m — 1)(m + 1)(m + 3), 
A, = —30m/(m — 1)(m + 1)(m + 2)(m + 3); 


in which case 
v/o” = rAom + Aym*/12. 
Thus, the smoothest approximation in (2 or in (3 is the following: 
A = 3[z(—3) + x(3)] + $[x(—2) + 2(2)] + H[e(-1) + 2Q1)] + 7 2). 


By the method of Lagrange’s multipliers, one may establish the following 
relations for the smoothest formula in Qn, . Here 7 has the same range of values 
as before; u and »v range over 0, 1, --- , [n/2]. 


b; — z 1", 
Bw 


v/o o y ee 
oe 


APPROXIMATION FORMULAS 615 


where 
cy = m"/4"(2n + 1), 
and \, are determined by the equations 


> hy 7 vor) oo. 
B t 


The class Qn, is such that for each A ¢@n,, there is a function k(t) with the 
following property?’ 


foi ~ i= i] x P()k(t) de, 
—m /2 
whenever «x is a function with continuous (nm + 1)th derivative. The quantity 


tim [ ~ H(t) dt 


m/2 


is useful in appraising FR, since 
R? “J [ a4)? dt, 
— m,/ 2 
by Schwarz’s inequality. 
Values of J for the smoothest formulas are as follows. 
0:J = m’/6(m + 1). 
n=1:J = m(3m? + 2m + 1)/360(m + 1). 


For n = 2 and 3, and m S 6, the numerical values of J are as follows. 


n 


m J J 

(n = 2). (n = 3). 
2 1/1,890 1/9 ,072 
3 11/8, 960 13/17 ,920 
4 134/33 ,075 62 ,539/13 ,891 ,500 
5 1 ,865/150 ,528 136 ,223/6 322,176 
6 8/245 6 ,683/82 ,320 


For the method of calculation of J, as well as the transformation of J under a 
linear transformation of t, the reader may consult the paper’. 








616 JOHN E. WALSH 


ON THE POWER FUNCTION OF THE “BEST” t-TEST SOLUTION OF THE 
BEHRENS-FISHER PROBLEM 


By Joun E. WauLsH 


The Rand Corporation 


1. Introduction. The Behrens-Fisher problem is concerned with significance 
tests for the difference of the means of two normal populations when the ratio 
of the variances of the populations is unknown. Denote one population by 
N(a, , oi) and the other by N (az, 03), where the notation N(a, o”) represents a 
normal population with mean a and variance o°. Let m sample values be drawn 
from N(a,, o}) and n sample values from N (a2, 03) where m < n. Then Scheffé 
[1] has shown that certain optimum properties are possessed by a t-test solution 
he proposed for the Behrens-Fisher problem, in which the numerator of t is based 
on the difference of the means of the samples while the denominator is based on 
the square root of a function of the sample values which has a x’-distribution 
with m — 1 degrees of freedom. The purpose of this note is to compare the power 
function of this ¢-test with the power function of the corresponding most powerful 
test for the case in which the ratio of variances o{/e2 is also known (only one- 
sided and symmetrical tests are considered). This comparison is made by com- 
puting the power efficiency (see section 2 for definition) of Scheffé’s test. 

It is sufficient to limit power efficiency investigations to one-sided tests. As 
shown in [2], a symmetrical ¢-test with significance level 2a has the same power 
efficiency as the corresponding one-sided t-test with significance level a. Equation 
(2) of section 2 furnishes an explicit formula whereby approximate power effi- 
ciencies can be computed for a wide range of values of a, m, n. Table 1 contains 
values of (2) for a = .05, .01 and several values of m and n. 

For the situation considered here, a power efficiency of 100r%% has the quantita- 
tive interpretation that the given test based on samples of size m and n has 
approximately the same power function as the corresponding most powerful 
test based on samples of size rm and rn. Intuitively the power efficiency 
of a test measures the percentage of available information per observation 
which is utilized by that test. 


2. Power efficiency derivations: The basic notion of the power efficiency of 
a significance test is given in [2]. For the present case the problem is to determine 
the value r such that a most powerful test of the same hypothesis (same sig- 
nificance level) based on rm and rn sample values will have approximately the 
same power function as the given /-test based on m and n sample values (from 
N(a , o;) and N(az2, o2) respectively). Here the value of oi/o2 is assumed to be 
known. Then the power efficiency of the given t-test equals 100r%. 

If the ratio of variances o;/o2 is known, the most powerful significance test 
(one-sided and symmetrical) for the difference of means of the two normal 
populations is a t-test where the numerator of ¢ is based on the difference of the 





TABLE 1 


Percentage Power Efficiencies for Certain Values of m and n 




















| a = 05 
| 6 | 10 | 1 2 | 30 | 50 | 100| » 
4 [79.6 | 73.5 | 67.2 | 68.4 61.4 | 59.3 | 57.6 56.2 | 54.9 
c [iam mar om 
10 | 92.6 | 90.9 | 89.8 | 88.6 | 87.3 | 86.2 | 85.0 
15 | 95.2 | 94.4 | 98.5 | 92.5 91.5 | 90.3 
20 ft 96.4 | 05.7 | 94.9 | 94.0 | 92.9 
30 fp 97 7 | 97.1 | 96.4 95.3 
50 ; ) ef f Fg 98.6 | 98.1 | 97.2 
100 ee a | 99.3 | 98.6 
a a ed 
<0 | | | 1100.0 
«= 01 —_ 
| . | 
w~| ie 15 | 20 is 100 | 
6 | 74.9 | 70.2 66.7 | 61.2 | 57.9 | 54.3 | 51.1 | 48.6 | 45.9 
8 | 81.3 | 78.8 | 74.7 | 72.1 | 69.1 | 66.3 | 63.9 | 61.4 
10 a | 85.3 | 81.9 | 79.8 | 77.2 74.7 | 72.5 | 69.9 
15 i. 90.4 | 88.9 | 87.0 | 85.0 | 83.1 | 80.7 
20 “; £ fF | | 92.9 | 91.4 | 89.8 | 88.1 | 85.8 
o «|lCU|tUCUYCU 95.3 | 94.1 | 92.8 | 90.7 
~~ o | | | | | L_ Jere | 94.5 





























































































































































































































































618 JOHN E. WALSH 


two sample means while the denominator is based on the square root of a function 
of the sample values and oi/o2 which has a x’-distribution with m + n — 2 de- 
grees of freedom [1, p. 43]. Thus the problem is that of comparing the power 
functions of two t-tests. 

As stated in section 1, it is sufficient to consider one-sided tests. We find, using 
a modification of the normal approximation to the power function of a one-sided 
t-test given in [3], that Scheffé’s one-sided t-test for the Behrens-Fisher problem 
and the corresponding most powerful one-sided test (01/02 known) have approxi- 
mately the same power function when r is chosen so that 


Ka — 60/r{l — K2/2[(m + n)r — 2]}"? = Ka — ofl — K2/2(m — 1)}”, 


where a is the significance level of the tests, K. is the value of the standardized 
normalized deviate exceeded with probability a, and 6 is a function of m, n, 
a, , 42, 01, o2 and the given hypothetical value of a; — a being tested. This 
condition for the approximate equality of the power functions is reasonably 
accurate for the following cases: a = .05,m > 4; a = .025,m > 5;a = .Ol, 
m > 6; a = .005, m > 7. The accuracy of the approximation increases as m 
increases. 

Hence a value of r such that the two power functions are approximately equal 
is determined by the equation 


(1) r{1 — K2,/2[(m + n)r — 2]} = 1 — K%/2(m — 1). 
Let 

A = A(m,a) = 1 — K%,/2(m — 1). 
Then solving (1) for the appropriate root yields 


= _—_ r2 9 
r= Sm an) + (m+ n)A + Ke/2 


+ V([2+ (m+ n)A + K%/2? — 8(m + n)A}. 


Thus the power efficiency of Scheffé’s one-sided t-test solution to the Behrens- 
Fisher problem, for the case in which the ratio of the variances is also known, is 
approximately equal to 
50 i ‘ = — > 
aaa {2+ (m+n)A+Ke/24+ V(2+(m+n)A+ Ky/2P—8(m+n)A }% 
for suitable values of a and m. 
REFERENCES 
[1] Henry Scuerr®é, ‘On solutions of the Behrens-Fisher problem based on the (-distribu- 
tion,’’ Annals of Math. Stat., Vol. 14 (1943), pp. 35-44. 
[2] Joun E. Watsu, “Some significance tests for the median which are valid under very 
general conditions,’’ Annals of Math. Stat., Vol. 20 (1949), pp. 64-81. 


[3] N. L. Jounson anv B. L. Wetcu, “Applications of the non-central t-distribution,’”’ 
Biometrika, Vol. 31 (1940), p. 376. 

















FISHER’S INEQUALITY 619 


A NOTE ON FISHER’S INEQUALITY FOR BALANCED INCOMPLETE 
BLOCK DESIGNS 


By R. C. Boss 


Institute of Statistics, University of North Carolina 





1. An experimental design in which v varieties or treatments are arranged in 
b blocks, is called a balanced incomplete block design if 

(i) Each block has exactly k treatments (k < v) no treatment occurring twice 
in the same block. 

(ii) Each treatment occurs in exactly r blocks. 

(iii) Any two treatments occur together in exactly \ blocks. 

It is easy to see that the parameters », b, r, k, \ of the design satisfy the rela- 
tions 






(1.0) bk = or 
(1.1) 


Also it is readily seen that 


Av — 1) = r(k — 1). 











(1.2) r>rX 









for otherwise with any given treatment every other treatment would occur in 
every block. This would make k = v, and the design would become a ‘randomised 
block design’. 

Fisher (1940), showed that a necessary condition for the existence of a bal- 
anced incomplete block design with v treatments and b blocks is 


(1.3) 









b2v. 
It is the object of this note to give a very simple proof of Fisher’s inequality. 
2. Consider a balanced incomplete block design with parameters 


(2.0) v, b, r, kX 





and let 
(2.1) 


according as the ith treatment does or does not occur in the jth block. Clearly 








nijz = 1 or O 


(2.2) 


(2.3) 


620 ABSTRACTS OF PAPERS 


If possible let b < v. Consider the v X v matrix 
N41 M2 “ Nib 0 


M1 Ne. -'s Ne O 


Ny Ny *** Np O vee 0 


where the last v — b columns of N consist of zeros. It follows from (2.2) and (2.3) 
that 


A 
where N’ denotes the transpose of N. 
(2.6) det (VN’) = {r + Av — 1)} (r — A)” 
But = kr(r — \)"" from (1.1). 
(2.7) det (NN’) = det N det N’ = 0. 


This makes r = 4, and contradicts (1.2). Hence the assumption b < v is wrong, 
and we must have 


(2.8) b>v 


REFERENCES 


[1] R. A. FisHer, ‘‘An examination of the different possible solutions of a problem in in- 
complete blocks,’’ Annals of Eugenics, London, Vol. 10 (1940), pp. 52-75. 

[2] F. Yates, ‘‘Incomplete randomised blocks,’”’ Annals of Eugenics, London, Vol. 7 (1936), 
pp. 121-140. 


BR a 


ABSTRACTS OF PAPERS 


(Presented September 1, 1949 at Boulder at the Twelfth Summer Meeting of the Institute) 


1. Structure of Statistical Elements. Duanre M. Strup.ey, Foundation Research, 
Colorado Springs, Colorado. 


Research in logical semantics and in practical elementation has set forth the proposition 
that all words and ideas have set form. As a consequence of this universal proposition 
all notions and conceptions in statistics should be accessible to set-theoretic analysis and 
interpretation. This paper explains the results of a preliminary analysis performed on 
statistical notions and conceptions with a view to a proper organization of definitions and 
conceptions which will, it is hoped, make possible a better and simpler construction of 
statistics from a system of basic notions. 











ABSTRACTS OF PAPERS 621 


2. On the Relative Efficiencies of BAN Estimates. Leo Katz, Michigan State 
College, East Lansing, Michigan. 
















J. Neyman, in the Proceedings of the Berkeley Symposium on Mathematical Statistics 
and Probability, 1949, proved that x? minimum estimates with either of two alternative 
definitions of x? are efficient, as also are the maximum likelihood estimates. He also raised 
the question whether some of these estimates were better than others. This paper bears 
on that question. In making x? minimum estimates, it is often necessary to avoid small 
frequencies by grouping together at least one tail of the distribution. It is with respect to 
the parameters of these modified distributions that the x? estimates are efficient. Define 
relative efficiency in these circumstances as the ratio of the variance of an efficient estimator 
in the unmodified case to that of one in the modified case. It is shown that, except for a 
rectangular probability law, the relative efficiency <1 and, further, it decreases as the tail 
grouping is made wider. Formulae are given for the relative efficiencies of x? minimum esti- 
mators for Binomial and Poisson probability laws and some representative values com- 
puted to exhibit these effects. 


3. Adjustment of an Inverse Matrix Corresponding to Changes in the Elements 
of a Given Column or a Given Row of the Original Matrix. Jack SHERMAN 
and WINIFRED J. Morrison, The Texas Company Research Laboratories, 
Beacon, New York. 










A simple computational procedure is derived for obtaining the elements 6;; of a nth 
order matrix (B’) which is the inverse of (A’), directly from the elements b;; of a matrix 
(B) which is the inverse of (A), when (A’) differs from (A) only in the elements of one col- 
umn, say the Sth column. The equations which ‘orm the basis of the computation are: 









bg; 
, 87 . ‘ 
bs; = ———-, ) = Re im a 


/ 
Z. bsr Arg 


i=] 
n 6@ 13--+3—-L8+1,+--8 
bi; = bi; bs; >, bir ars , . 
i 4m 1g, *** @. 


Analogous equations are derived for the case that A and A’ differ in the elements of a 
given row rather than a column. 


4. On the Problem of Optimum Classification. Paut G. Hort, University of 
California at Los Angeles. 















Let fi , (¢ = 1, 2,--- , k), be the probability density function of population 7 and let 
pi be the probability that population 7 will be sampled. Assume certain differentiability 
conditions and moment properties. Then, for known parameters, the probability of a cor- 
rect classification will be maximixed by choosing the region VM; , which corresponds to clas- 
sifying into population 7, as that part of variable space where pif; > pifj , (7 = 1,2, -++ , &). 
If the parameters are unknown, an asymptotically optimum set of estimates will be 
given by the set that minimizes a certain form in the covariances. Among uncorrelated 
estimates, maximum likelihood estimates are seen to be asymptotically optimum. 

If weight functions, W;; , are introduced and the expected value of the loss is minimized, 
the same methods of proof show that the region 1/; becomes that part of variable space 


k 


where S p,f-(W-; — Wri) > 0, G = 1,2, ---, k), and that the criterion for an asymptoti- 


r=] 


cally optimum set of estimates is of the same form as the preceding criterion. 





622 ABSTRACTS OF PAPERS 


5. Optimal Linear Prediction of Stochastic Processes whose Covariances are 
Green’s Functions. C. L. Dotpn and M. A. Woopsury, University of Michi- 
gan, Ann Arbor]| 


A method of unbiased, minimal variance, linear prediction is developed for problems 
similar to those of prediction and filtering treated by Wiener. It differs from these in that, 
the unbiased condition is imposed, only a finite part of the past is employed, and no sta- 
tionary assumption is used. It is shown that the special stationary case discussed by Cun- 
ningham and Hund, ‘‘Random Processes in Problems of Air Warfare’ (Supp. Journal Royal 
Stat. Soc., 1946) succeeds because the correlation function, e“—® , well known to that of 
the process defined by the Langevian equation, is the Green’s function of the homogeneous 
differential equation formed by letting the adjoint differential operator of the Langevian 
equation operate on the operator of this equation. This relationship is shown to persist 
for any physically stable linear differential equation driven by ‘‘white noise.’’ The well- 
known equivalence between integral and differential equations is then extended by use of 
Stieltjes integrals and used to effect the solutions of the integral equations of the first kind 
which yield the ‘“‘optimum”’ linear prediction. The nonstationary example consisting of 
purely random motion about a mean linear path in the presence of radar type errors is 
treated in detail. 


6. The Integral of the Gaussian Distribution over the Area Bounded by an 
Ellipse. H. H. Germonp, RAND Corporation, Santa Monica, California. 


This paper describes the preparation of tables from which to obtain the integral of a 
bivariate Gaussian distribution over the area of an ellipse. The center of the ellipse need 
not coincide with the mean of the Gaussian distribution, nor need the axes of the ellipse 
have any special orientation with respect to the Gaussian distribution. 


7. Theorems on Convergency of Compound Distributions with Symmetric Com- 
ponents. (By title) Marta CasTELLANI, University of Kansas City. 


The purpose of this paper is to present some results obtained when operations of convu- 
lution in R,; are concerned with a specific family of distributions. The compound distribu- 
tion K(x) = F(x) * G(x) is here obtained combining any d.f. F(z) with a d.f. G(x) under 
the restriction of symmetry, i.e., G(x + h) + G(x — h) = 1foranyh>0O. 

A generalization of Cantelli’s Inequalities will enable us to write a preliminary theorem 
on the following upper and lower bounds: 


2 


F(a—h) - 2 f dG(y) < K(a) < F(a+h) + 2 | dG(y), 
h h 


‘ 
eo 


oe 
K(a —h) - 2 | dG(y) < F(a) < K(a+h) + 2 | dG(y), 
h h 
where ais any point in R; andh > 0. 
The theorem is derived assuming the Stieltjes Integral, 


+00 
K(a) = F(a — y) dG(y), 
=o 


is taken as a sum of three integrals connected with three convenient intervals (—~ , —h), 
(—h, h), (h, +). When the symmetric component of the convolution is a member of a fam- 


es 


ABSTRACTS OF PAPERS 623 


. : : . 2 ” 24,2 . : 
ily of normal distributions such asG,(z) = a e~*¥" dy, where a is an arbitrary par- 
wT 
—™ eo 


ameter, the use of Cantelli’s Inequalities give 
$f” 2 
K,(a — h) — K,(a) — / e“ du < F(a) — K,(a) 
Vr Jan 


a 
<k,0+ - Kid +-S | o-™* du, 
Va Jak 


where K,(z) = F(z) * G.(z). 

The d.f. K.(z) is a continuous point function in R; , with afr. f-y(x) which is everywhere 
uniformly continuous. For an arbitrarily small » > 0, a convenient small h and large a 
may be found which will enable us to prove the following two theorems: 

THEOREM |: Given any d.f. F(x) in R; , there exists a convenient continuous d.f. K4(z) 
which fora — ~ converges asypmtotically and uniformly almost everywhere to the given 
d.f. F(z). 

THEOREM 2: Given any d.f. F(x) in R,; , there exists in any continuity bordered interval 
a convenient uniformly convergent series of continuous functions which asymptotically 
approach the given F(z). 


8. Partial Sums of the Negative Binomial in Terms of the Incomplete Beta- 
Function. (By title) Jutrus Lirsiern, Statistical Engineering Laboratory, 
National Bureau of Standards. 


In acceptance sampling a certain size sample is taken at random from a lot of items and 
the lot is accepted if the number of defective items do not exceed a predetermined number 
characteristic of the sampling plan. The Statistical Engineering “uaboratory has been 
studying the probabilities that a decision to accept or reject can be made before the sample 
is completely inspected. Such probabilities are found to involve certain sums apparently 
not previously treated. In this note the author proves a simple identity connecting these 
sums which greatly facilitates their computation and shows how they may be written in 
terms of the well-known incomplete beta-function of Karl Pearson, for which extensive 
tables are available. 


9. Large Sample Tests and Confidence Intervals for Mortality Rates. (By title) 
Joun E. Watsu, RAND Corporation, Santa Monica, California. 


In computing mortality rates from insurance data, the unit of measurement used is fre- 
quently based on number of policies or amount of insurance rather than on lives. Then 
the death of one person may result in several units of ‘‘death’’ with respect to the investi- 
gation; moreover, the number of units per individual may vary noticeably. Thus the usual 
large sample methods of obtaining significance tests and confidence intervals for the true 
value of the mortality rate are not applicable to these situations. If the number of units 
associated with each person in the ivestigation were known, accurate large sample results 
could be obtained; however, determination of the number of units associated with each 
individual would require an extremely large amount of work. This article presents some 
valid large sample tests and confidence intervals for the mortality rate which do not re- 
quire much work and are reasonably efficient. The procedure followed consists in first di- 
viding the risks into twenty-six subgroups on the basis of the first letter of the last name 
of the person insured. Some of the groups are then combined until 10 to 15 subgroups 
yielding approximately the same number of units are obtained. The fraction consisting of 
the total number of units paid divided by the total number of units exposed is computed 








624 NEWS AND NOTICES 


for each subgroup. Asymptotically the resulting observations represent independent ob- 
servations from continuous symmetrical populations with common median equal to the 
true value of the rate of mortality. Tests and confidence intervals for the rate of mortal- 
ity are obtained by applying the results of the paper ‘‘Some Significance Tests for the 
Median which are Valid Under Very General Conditions’’ (Annals of Math. Stat., Vol. 20 
(1949), pp. 64-81 to these observations. 


(a ne RR ne 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Mr. Fred C. Andrews will be a teaching assistant in the Statistical Laboratory, 
Department of Mathematics, University of California for the academic year 
1949-1950. 

Dr. Joseph Berkson has been promoted to the rank of Professor in the Uni- 
versity of Minnesota Graduate School and Mayo Foundation. He continues as 
Chief of the Division of Biometry and Medical Statistics of the Mayo Clinic. 

Mr. Colin R. Blyth is now a research assistant at the University of California, 
Statistical Laboratory, Berkeley. 

Mr. Clyde A. Bridger is now Director of the Section of Statistics and State 
Registrar of Vital Statistics for the Division of Health of Missouri. 

Mr. Loren V. Burns, formerly with the MFA Milling Company at Springfield, 
Missouri, has been made Vice-President and Technical Director of the Spear 
Mills, Inc., Kansas City 6, Missouri. 

Professor Douglas Chapman, who obtained his Ph.D. in statistics at the Uni- 
versity of California, Berkeley, has accepted an appointment as Assistant Pro- 
fessor at the University of Washington in the Department of Mathematics and 
the Laboratory of Statistical Research. 

Dr. Andrew Laurence Comrey, who received his doctor’s degree from the Uni- 
versity of Southern California last June, has accepted an assistant professorship 
in the Department of Psychology at the University of Illinois. 

Dr. Donald A. Darling has been appointed to an instructorship in the Depart- 
ment of Mathematics, University of Michigan. 

Dr. Paul M. Densen resigned his position as Chief of the Division of Medical 
Research Statistics of the Department of Medicine and Surgery of the Veterans 
Association as of July 1, 1949 to join the staff of the Graduate School of Public 
Health, University of Pittsburgh, as an Associate Professor of Biostatistics. 

Mr. Amron H. Katz has been promoted to the position of Chief Physicist of 
the Photographic Laboratory, Engineering Division, Air Material Command, 
Wright Patterson Air Force Base, Dayton, Ohio. 

Associate Professor Louis Guttmann, who had been on leave for two years from 
the Department of Sociology of Cornell University conducting a research pro- 
gram in Israel, was invited to remain in Israel for another year to direct the 


ac’ 
He 


du 
a | 


of 


pl 


cel lO 


NEWS AND NOTICES 625 


activities of the recently founded Israel Institute of Public Opinion Research. 
He is serving as Chief Consultant. 

Mr. Herne Ernest LaF ontant who was attending the University of Michigan 
during the academic year 1948-1949 working on his doctor’s degree, has accepted 
a position as statistician for the B.T.W. Insurance Co. at Birmingham, Alabama. 

Assistant Professor Jerome C. R. Li has been promoted to Associate Professor 
of Mathematics at the Oregon State College, Corvallis, Oregon. 

Professor H. B. Mann of Ohio State University has accepted a visiting 
professorship and research associateship at the Statistical Laboratory at 
Berkeley, California for the year 1949-1950. 

Dr. Gottfried E. Noether has been appointed to an instructorship at New York 
University. 

Dr. G. R. Seth has just returned from a trip to England, Sweden, France and 
India where he visited statistical institutions. 

Assistant Professor Andrew Sobczyk has been promoted to Associate Professor 
of Mathematics at Boston University. 

Dr. Zenon Szatrowski, formerly with the Economics Department of North- 
western University, has accepted an associate professorship in the School of 
Business Administration, University of Buffalo. 

Professor Gerhard Tintner has returned to his teaching and research duties at 
Iowa State College after spending a year at the Department of Applied Eco- 
nomics at the University of Cambridge, England. He gave a course on Econ- 
ometrics at the University of Cambridge and during his stay in Europe, he 
lectured on econometric and statistical subjects in Universities at Bristol, Dublin, 
Hull, Paris, Manchester and Uppsala. 

Dr. A. E. R. Westman, Director of the Department of Chemistry, Ontario 
Research Foundation, left in September, 1949 for England where he is visiting 
industrial research laboratories and engaging in studies in the Department of 
Physical Chemistry, Cambridge University. He plans to return in June, 1950. 


Ca I a a 


Word has just been received here of the formation of the New Zealand Statisti- 
cal Association. The initial meeting was held in August, 1948 at Victoria Uni- 
versity College. The officers are: J. T. Campbell, President; I. D. Dick, Secretary. 
It is planned to hold one formal meeting a year at first with the hope of increasing 
this later. The main interest in statistical work in New Zealand has been bio- 
logical, but there is scope for considerable extension to industrial, educational and 
economic fields and it is hoped the formation of the Association will assist in this 
extension. 

ee 


New Members 


The following persons have been elected to pogo in the Institute 
(June 1, 1949 to August 22, 1949) 


Al-Doori, Younis A., Student at the University of California, 1916 Henry Street, Berkeley, 
California. 








626 NEWS AND NOTICES 


Bieber, Robert A., A.B. (Univ. of Calif.) S-18 Richmond Terrace, Richmond, California. 

Bula, Clotilde Angelica, Ph.D., (Univ. of Rosario, Argentina) Professor, University of 
Buenos Aires, Rioja 3681, Olivos-Pcia.de Buenos Aires, Argentina. 

Dalziel, Edwin R., Ph.D. (Univ., Edinburgh) Assistant Master, Palmerston North Techni- 
cal School, Palmerston North, New Zealand. 

Douglas, James B., Dip. Ed. (Melbourn Univ.) Lecturer in Mathematics, Neweastle Tech- 
nical College, Tighe’s Hill 2N, N.S.W., Australia. 

Hartley, Herman O., Ph.D. (Cambridge Univ.) Lecturer in Statistics, Department of 

- Statistics, University College, London, W.C.1, England. 

Immel, Eric R., M.A. (Queen’s Univ., Kingston, Canada) Teaching Assistant and Graduate 
Student, Department of Mathematics, University of California at Los Angeles, Los 
Angeles, California. 

Kelly, John P., Senior Technical Engineer, Carbide and Carbon Chemical Corporation, 
Oak Ridge, Tennessee, P.O. Box 473, Norris, Tennessee. 

Parel, Cristina P., M.S. (Univ. of Michigan) Instructor, Department of Mathematics, 
University of the Philippines, Manila, P.I. 

Philipson, Carl O., D.Sc. (Univ. of Stockholm) Actuary of Folket-Samarbete, Yngvevagen 
§, Djursholm, Sweden. 

Porter, Robert A., Ph.D. (N.C. State College, Raleigh, N.C.) Senior Mathematician, Uni- 
versity of Chicago, 17113 Longfellow Avenue, Homewood, Illinois. 

Rippe, Dayle D., M.A. (Univ. of Nebr.) Student, Teaching Fellow, Department of Mathe- 
matics, University of Michigan, 1049 Woburn Court, Willow Run, Michigan. 

Rogers, Robert L., A.B. (Univ. of Calif.) Student at University of California, Route 2, 
Box 74, Denio Avenue, Gilroy, California. 

Roy, Samarendra N., M.Sc. (Calcutta Univ.) Head of Department of Statistics, Calcutta 
University and Assistant Director, Indian Statistical Institute (now on leave) P.O. 
Box 168, Chapel Hill, North Carolina. 

Savey, Rosemary, M.B.A. (Univ. of Wisc.) Graduate Assistant and Student, University of 
Wisconsin, 2513 Norwood Place, Madison 5, Wisconsin. 


(ra eR a 


REPORT ON THE BOULDER MEETING OF THE INSTITUTE 


The Twelfth Summer Meeting of the Institute of Mathematical Statistics 
was held at the University of Colorado, Boulder, Colorado, Monday, August 29 
through Thursday, September 1, 1949. The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society, the Mathe- 
matical Association of America, and the Econometric Society. The meeting was 
attended by the following 79 members of the Institute: 


S. P. Agarwal, R. L. Anderson, T. W.. Anderson, V. L. Anderson, K. J. Arnold, I. W. 
Barankin, C. A. Bennett, Agnes Berger, E. E. Blanche, A. H. Bowker, J. C. Brixey, Jean 
Bronfenbrenner, J. H. Bushey, H. C. Carver, Herman Chernoff, K. L. Chung, A. G. Clark, 
I}. P. Coleman, E. L. Crow, J. H. Curtiss, W. J. Dixon, J. L. Doob, Aryeh Dvoretzky, 
H. P. Evans, W. D. Evans, W. T. Federer, William Feller, C. H. Fischer, J.S. Frame, T. C. 
Fry, H. M. Gehman, H. H. Germond, R. E. Greenwood, H. T. Guard, P. R. Halmos, J. L. 
Hodges, P. G. Hoel, Harold Hotelling, J. M. Howell, C. C. Hurd, C. A. Hutchinson, Irving 
Kaplansky, Leo Katz, H. S. Konijn, T. C. Koopmans, G. M. Kuznets, H. D. Larsen, D. H. 
Leavens, S. B. Littauer, H. B. Mann, Jacob Marschak, F. J. Massye, Dorothy J. Morrow, 
Jerzy Neyman, M. L. Norden, J. I. Northam, E. G. Olds, R. P. Peterson, G. B. Price, 
Mina S. Rees, P. R. Rider, F. D. Rigby, Herman Rubin, L. J. Savage, Elizabeth R. Scott, 


REPORT ON THE BOULDER MEETING 627 


I. E. Segal, Esther Seiden, Jack Sherman, W. B. Simpson, Milton Sobel, D. M. Studley, 
B. R. Suydam, A. G. Swanson, James Templeton, R. M. Thrall, J. W. Tukey, Abraham 
Wald, John Wishart, S. S. Wilks. 


The Monday afternoon session was devoted to invited addresses with Pro- 
fessor Leonard J. Savage of the University of Chicago presiding. The attendance 
was approximately fifty. Professor J. L. Hodges of the University of California 
presented a paper, Some Problems in Point Estimation, and Professor W. T. 
Federer of Cornell University presented A Comparison of the Proportionality of 
Covariance Matrices. 

On Tuesday Morning the Institute, the Mathematical Association of America, 
and the Econometric Society held a joint symposium on Mathematical Training 
for Social Scientists. Professor Jacob Marschak of the Cowles Commission for 
Research in Economics presided. The attendance was approximately one hundred 
fifty. The participating speakers were: Professor R. L. Anderson of North 
Carolina State College; Professor T. W. Anderson of Columbia University; 
Professor G. C. Evans of the University of California; Professor F. L. Griffin 
of Reed College; Professor Harold Gulliksen of Educational Testing Service; 
Professor William Jaffé of Northwestern University; Professor Harold Hotelling 
of the University of North Carolina; and Professor G. M. Kuznets of the Uni- 
versity of California. At the end of the session the following resolution was 
adopted by those in attendance at the meeting: 


Members of the Mathematical Association of America, the Institute of Mathematical 
Statistics, and the Econometric Society assembled in a joint session in Boulder, Colorado, 
on August 30, 1949, are of the opinion that officers of these societies should study the 
need for better mathematical training of social scientists, and the ways and means to 
improve mathematical preparation of social scientists, and that such a study may be 
most effectively conducted by a joint committee, possibly in co-operation with other 
interested societies, and in close touch with the Social Science Research Council, the 
National Research Council, or other national bodies concerned with general edueation 
and research. It is suggested that this committee report the results of its deliberations 
at the next joint meeting of the original participating societies. 


The two joint sessions of the Institute and the Econometric Society were 
devoted to a Symposium on Statistical Inference in Decision Making. Professor 
Jerzy Neyman of the University of California presided on Tuesday afternoon. 
The attendance was approximately eighty. Professor Aryeh Dvoretzky of 
Hebrew University, Jerusalem presented Decision Problems and Professor Abra- 
ham Wald of Columbia University presented Some Recent Results in the Theory 
of Statistical Decision Functions. On Wednesday Morning, under the chairman- 
ship of Professor Wald and an attendance of approximately seventy-five, the 
following papers were presented: Remarks on a Rational Selection of a Decision 
Function by Professor Herman Chernoff of the Cowles Commission for Research 
in Economics; Psychological Probabilities by Professor Leonard J. Savage; and 
Complete Classes of Decision Functions for Some Standard Sequential and Non- 
sequential Problems by Milton Sobel of Columbia University. 





628 REPORT ON THE BOULDER MEETING 


On Thursday Morning the Institute and the American Mathematical Society 
held a joint session for contributed papers with Professor P. R. Rider of Wash- 
ington University presiding. The attendance was approximately seventy-five. 
The following papers were presented: 


. Structure of Statistical Elements. 
Mr. Duane M. Studley, Foundation Research, Colorado Springs. 

. On the Relative Efficiencies of BAN Estimates. 
Professor Leo Katz, Michigan State College. 

3. Adjustments of an Inverse Matrix Corresponding to Changes in the Elements of a Given 

Column or a Given Row of the Original Matriz. 
Dr. Jack Sherman and Miss Winifred J. Morrison, The Texas Company Research 
Laboratories, Beacon, New York. 

. On the Problem of Optimum Classification. 
Professor Paul G. Hoel, University of California at Los Angeles. 

5. Optimal Linear Prediction of Stochastic Processes whose Covariances are Green’s Func- 

tions. 
Professor C. L. Dolph and Dr. M. A. Woodbury, University of Michigan. 

. The Integral of the Gaussian Distribution over the Area Bounded by an Ellipse. 
Dr. H. H. Germond, Rand Corporation, Santa Monica, California. 

. Theorems on Convergency of Compound Distributions with Symmetric Components. 
(By title) 
Dr. Maria Castellani, University of Kansas City. 

. Large Sample Tests and Confidence Intervals for Mortality Rates. (By title) 
Dr. J. E. Walsh, Rand Corporation, Santa Monica, California. 


9. Partial Sums of the Negative Binomial in Terms of the Incomplete Beta-function. (Bv 
title) 


Dr: Julius Lieblein, National Bureau of Standards. 


On Thursday afternoon Professor Jerzy Neyman presented the Second Rietz 
Memorial Lecture on Consistent Estimates of the Linear Structural Relation in 
the General Case of Identifiability. Professor Harold Hotelling presided and the 
attendance was approximately fifty. Dr. R. P. Boas, Jr. of Mathematical Reviews 
presented an invited address The Representation of Probability Distributions by 
Charlier Series. 

The Institute sponsored a beer party on Tuesday evening and on Thursday 
evening a fry was held on Flagstaff Mountain. 

Harris T. GuarD 
Assistant Secretary 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


JUNE 1949 
Articles 


The Current Status of State and Local Population Estimates in the Census 
Sureau Henry S. Suryock, JR., AND NORMAN LAWRENCE 
The Uses and Usefulness of Binomial Probability Paper. 
FREDERICK MOSTELLER AND JOHN W. TUKEY 
Teaching Statistical Quality Control for Town and Gown 
Epwin G. Oups anp Litoyp A. KNOWLER 
The Use of Sampling in Great Britain C. A. MosER 
Unemployment and Migration in the Depression (1930-1935) 
{0NALD FREEDMAN AND Amos H. HAWLEY 
Minimum X? and Maximum Likelihood Solution in Terms of a Linear Transform, 
with Particular Reference to Bio-Assay ...... JOSEPH BERKSON 
Some Inadequacies of the Federal Censuses of Agriculture.. RAYMOND J. JESSEN 
The Edge Marking of Statistical Cards.......................A. M. Lester 
Conrad Alexander Verrijn Stuart (1865-1948). . ..Wauter F. WiLicox 
J 
Proceedings of the 108th Annual Meeting 
Book Reviews 


AMERICAN STATISTICAL ASSOCIATION 
1603 K Street, N. W., Washington 6, D. C. 


MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street, New York City 27 








SKANDINAVISK 
AKTUARIETIDSKRIFT 


1948 - Parts 3 - 4 
Contents 


E. Kivrkosk1: Uber die Konvergenz des Iterationsverfahrens bei der Berech- 
nung des effektiven Zinsfusses 


W. SrmonseEn: On Divided Differences and Osculatory Interpolation 


Ernst Zwincct: Initiation of a Formula for Approximate Valuation of Pre- 
miums for Disability Benefits 


H. AMMETER: A Generalisation of the Collective Theory of Risk in Regard 
to Fluctuating Basic-Probabilities 


TRYGGWE Saxkén: On the Probability of Ruin in the Collective Risk Theory 
for Insurance Enterprises with only Negative Risk Sums 


H. Wo tp: On Stationary Point Processes and Markov Chains 


Annual subscription: 10 Swedish Crowns (Approx. $2.00). 


Inquiries and orders may be addressed to the Editor, 


SKARVIKSVAGEN 7, DJURSHOLM (SWEDEN) 








BIOMETRIKA 


A Journal for the Statistical Study of Biological Problems 

Volume XXXVI Contents Parts I and II, June 1949 
I. The infectiousness of measles. By MAJOR GREENWOOD. II. A note on the 
analysis of grouped probit data. By K. D. TOCHER. III. A generalization of 
Poisson’s binomial limit for use in ecology. By MARJORIE THOMAS. IV. The 
estimation and comparison of residual regressions where there are two or more related 
sets of observations. By A. H. CARTER. V. Cumulants of multivariate multi- 
nomial distributions. By JOHN WISHART. VI. On the Wishart distribution in 
statistics. By A. C. AITKEN. VII. The spectral theory of discrete stochastic 
processes. By P. A. P. MORAN. VIII. On a property of distributions admitting 
sufficient statistics. By V.S. HUZURBAZAR. IX. On a method of trend elimi- 
nation. By M. H. QUENOUILLE. X. On the estimation of dispersion by linear 
systematic statistics. By H. J. GODWIN. XI. On the reconciliation of theories 
of probability. By M.G. KENDALL. XII. The derivation and partition of x? in 
certain discrete distributions. By H. O. LANCASTER. XIII. A note on the 
subdivision of x? into components. By J. O. IRWIN. XIV. The first and second 
moments of some probability distributions arising from points on a lattice and their 
application. By P. V. KRISHNA IYER. XV. Probability Tables for the range. 
By E. J. GUMBEL. XVI. Systems of frequency curves generated by methods of 
translation. By N. L. JOHNSON. XVII. Rank and product-moment correlation. 
By M. G. KENDALL. XVIII. Tests of significance in harmonic analysis. By H. 
O. HARTLEY. XIX. The non-central X- and F-distributions and their applica- 
tions. By P. B. PATNAIK. XX. MISCELLANEA: On a method of estimating 
frequencies. By D. J. FINNEY. A further note on the mean deviation from the 
median. By K. R. NAIR. REVIEWS: Theory of probability and Karl Pearson’s 
early statistical papers. 

The subscription price, payable in advance, is 45s. inland, 54s. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 


University College, London, W.C. 1.’’ All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 








SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. IX, Parts 2 and 3, 1949 


PR Oe NI 5s hoisa i wis ae ORG NDA Mew Reuse ae D. N. MasumpAR 
Chapter 1. Previous work in India on phy sical anthropometry 
Chapter 2. Collection of anthropometric data in the 1941 survey 

Part II, Statistical Analysis.................... P. C. Manaranosis & C. R. Rao 
Chapter 3. Arrangements for statistical analysis 
Chapter 4. Basic statistical concepts 
Chapter 5. Normality of frequency distributions 
Chapter 6. Caste and tribal differences 


Part III, Anthropological Observations...................... P. C. MAHALANOBIS 
Chapter 8. Physical appearance in relation to ethnological evidence 
Supplement: Ethnological notes 


Annual subscription: 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhya, Presidency College, Calcutta, India. 





ECONOMETRICA 


Journal of the Econometric Society 


Contents of Vol. 17, No. 2, April, 1949, include: 
Page 


Coxtin CiarK: A System of minen re the United States Trade 
Cycle, 1921 to 1941 


TJALLING C. Koopmans: Identification Problems in Economic Model Construc- 


LAWRENCE R. Kern: A Scheme of International Compensation 


M. H. Exxer: A Scheme of International Compensation: Postscript 
ANNOUNCEMENTS, NOTES, AND MEMORANDA 


Published Quarterly Subscription to Nonmembers: $9.00 per year 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics. 


Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in 
applying for membership should be addressed to Alfred Cowles, Secretary and Treasurer, The Econ- 
ometric Society, The University of Chicago, Chicago 37, Illinois, U.S.A. 











THE INSTITUTE OF MATHEMATICAL STATISTICS 


(Organized September 12, 1935) 


OFFICERS FOR 1949 


President: 
J. NeyMan, University of California, Berkeley 


President-Elect: 
J. L. Doos, The University of Illinois, Urbana 


Secretary-Treasurer: 
C. H. Fiscuir, University of Michigan, Ann Arbor 


Editor: 
S. S. Wixxs, Princeton University, Princeton, N. J. 


The purpose of the Institute of Mathematical Statistics is to stimulate 
research in the mathematical theory of statistics and to promote codperation 
between the field of pure research and the fields of application. 


Membership dues including subscription to the ANNALS OF MATHEMATICAL 
Sratistics are $7.00 per year within the Western Hemisphere and $5.00 per 
year elsewhere. Dues and inquiries regarding membership in the Institute 
should be sent to the Secretary-Treasurer of the Institute. 


MEETINGS OF THE INSTITUTE 


ANNUAL MEETING—NEW YORK CITY—December 26-31, 1949 
To be held in conjunction with the meetings of the American Statistical 
Association. 


Abstracts must be in the hands of Associate Secretary S. B. Littauer, 
Department of Industrial Engineering, Columbia University, New York 
27, N. Y., not later than November 15. 


CHAPEL HILL, N. C., March 17-19, 1950. 

Joint Meeting with the Biometric Society. Abstracts of papers in Math- 
ematical Statistics must be in the hands of ‘Professor Herbert Robbins, 
Department of Mathematical Statistics, Chapel Hill, N. C., not later than 
February 1. Abstracts of Biometric papers must be in the hands ‘of Pro- 
fessor H. L. Lucas, Department of Experimental Statistics, State College, 
Raleigh, by February 1. 





